← Back to data subTLDR
data subTLDR week 46 year 2025
r/MachineLearningr/dataengineeringr/SQL
Practicing SQL with Real Databases: Community's Top Suggestions, Preferred Database Software and SQL Studios, Simplifying BI Dashboards, and Lessons from Real-time Factory Telemetry
•Week 46, 2025
Posted in r/MachineLearningbyu/jacobgorm•11/13/2025
278
[R] LeJEPA: New Yann Lecun paper
Research
The new paper by renowned AI researcher Yann Lecun, introducing a novel objective called LeJEPA, has sparked significant interest. The paper's theoretical grounding and potential benefits such as a single trade-off hyperparameter and stability across hyper-parameters have been praised. However, some users found the theory heavy to digest. There were also concerns about the efficiency of the concept of views and the applicability of JEPA for convnets. While some found the paper humbling and inspiring, others expressed reservations about its practical implementation. Overall, the sentiment was positive, reflecting a deep respect for Lecun's contributions to the field.
Posted in r/dataengineeringbyu/one-step-back-04•11/11/2025
220
DON’T BE ME !!!!!!!
Discussion
The thread discusses the common mistake of overcomplicating business intelligence (BI) dashboards that clients simply want to be usable and straightforward. A key insight is that while detailed dashboards might seem impressive, they often fail to add value if they don't directly support decision-making. Many participants agreed that clients usually prefer targeted, actionable information over extensive data. Communication and understanding the industry and the people are also essential in BI. Furthermore, some users noted that simple Excel tables often suffice, as users want to sort and analyze data based on their needs. The sentiment was generally positive, with participants sharing lessons learned from their experiences.
Posted in r/MachineLearningbyu/Technical_Proof6082•11/10/2025
183
[D] ICLR 2026 Paper Reviews Discussion
Discussion
The ICLR 2026 paper review discussion has elicited mixed reactions, reflecting anticipation, skepticism, and humor. Many are eagerly refreshing the page in hopes of seeing their reviews, despite acknowledging the likelihood of AI-written responses. There's criticism of the review process, with some questioning the quality of reviews and others mocking rushed last-minute evaluations. A sense of solidarity is also apparent, highlighting the shared struggle of waiting and wishing each other luck. Frustrations about dataset selection for AI research and technical issues with the website are also expressed. The overall sentiment leans towards camaraderie in the face of an imperfect review process.
Posted in r/MachineLearningbyu/BetterbeBattery•11/12/2025
176
[D] <ICLR review comment> Is this real?
Research
The discussions revolve around a controversial paper review and the author's response. The majority opinion, represented by the most upvoted comment, confirms the review's authenticity. Many criticize the paper's presentation and formatting issues, citing unprofessionalism in structure and design. Despite these criticisms, there is also agreement that the paper's scientific merit doesn't warrant an extremely low score. The author's response, seen as inappropriate, drew additional criticism. Some commenters suggest that the paper should be rejected, and there's a proposal for a ban. Overall, the sentiment is negative towards both the paper and the author's reaction.
Posted in r/dataengineeringbyu/muttibaaz•11/14/2025
87
streaming telemetry from 500+ factory machines to cloud in real time, lessons from 2 years running this setup
Discussion
The team built a real-time monitoring system for over 500 factory machines, generating about 2 million data points daily. Initial attempts at implementing this system using MQTT brokers and Kafka clusters proved unsuccessful due to scalability and management issues. The solution was a simplified system that prioritized reliable data collection and could handle network failures, using messaging software on inexpensive hardware at each factory. This edge-first approach, where devices work independently and sync when possible, proved effective. However, the post faced criticism for its lack of clarity and detailed explanation of the solution implemented, with users asking for more specific information.
Posted in r/SQLbyu/Awkward_Affect_1941•11/14/2025
78
Hi I just want to know where I can practice sql with a real database?
SQL Server
The community recommends several resources for practicing SQL with a real database. Microsoft's Adventureworks database is a popular choice due to its wide use in practice exercises. Kaggle, Data.gov, and SandboxSQL (SQLite) are also useful for accessing real datasets. Other options include TCPDS via shell.duckdb.org and Google's Big Query for easy access to public datasets. Northwind, another Microsoft database, is also recommended. The W3 School's SQL test database is suggested for beginners. Users can also modify these databases to suit their needs, demonstrating the flexibility and adaptability of these resources.
Posted in r/SQLbyu/IllustratorSalty9753•11/10/2025
62
best database software
Discussion
The consensus among Reddit users is that Postgres is a highly recommended database software due to its reliability, scalability, and balance of performance and simplicity. It's considered the default choice unless there's a specific reason to use another. SQL Server and MySQL also receive mentions, but there is caution against Oracle and MySQL over concerns regarding data integrity and silent failure. SQLite is deemed not suitable for web scale projects or multiple connections, although it's suitable for single-user systems. The importance of matching the database solution to the project's expected scale was also highlighted.
Posted in r/SQLbyu/Koch-Guepard•11/13/2025
43
What is the best SQL Studio ?
PostgreSQL
A discussion regarding the best SQL Studios revealed a preference for options such as DBeaver, SSMS, SQL Developer, and DataGrip, with some users also mentioning VSCode and TOAD. DataGrip by Jetbrains was particularly praised and its recent availability for free personal use was highlighted. VSCode's versatility was appreciated, while DBeaver was recommended for use with different RDBMS. The command line interface and Duckdb were also mentioned as user-friendly for some. The general sentiment was positive, emphasizing the range of available tools to suit individual preferences and needs, with no clear consensus on a single best SQL Studio.
Subscribe to data-subtldr
Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.