← Back to data subTLDR

data subTLDR week 46 year 2025

r/MachineLearningr/dataengineeringr/SQL

Practicing SQL with Real Databases: Community's Top Suggestions, Preferred Database Software and SQL Studios, Simplifying BI Dashboards, and Lessons from Real-time Factory Telemetry

Week 46, 2025
Posted in r/MachineLearningbyu/jacobgorm11/13/2025
278

[R] LeJEPA: New Yann Lecun paper

Research
The new paper by renowned AI researcher Yann Lecun, introducing a novel objective called LeJEPA, has sparked significant interest. The paper's theoretical grounding and potential benefits such as a single trade-off hyperparameter and stability across hyper-parameters have been praised. However, some users found the theory heavy to digest. There were also concerns about the efficiency of the concept of views and the applicability of JEPA for convnets. While some found the paper humbling and inspiring, others expressed reservations about its practical implementation. Overall, the sentiment was positive, reflecting a deep respect for Lecun's contributions to the field.
31 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/one-step-back-0411/11/2025
220

DON’T BE ME !!!!!!!

Discussion
The thread discusses the common mistake of overcomplicating business intelligence (BI) dashboards that clients simply want to be usable and straightforward. A key insight is that while detailed dashboards might seem impressive, they often fail to add value if they don't directly support decision-making. Many participants agreed that clients usually prefer targeted, actionable information over extensive data. Communication and understanding the industry and the people are also essential in BI. Furthermore, some users noted that simple Excel tables often suffice, as users want to sort and analyze data based on their needs. The sentiment was generally positive, with participants sharing lessons learned from their experiences.
43 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Technical_Proof608211/10/2025
183

[D] ICLR 2026 Paper Reviews Discussion

Discussion
The ICLR 2026 paper review discussion has elicited mixed reactions, reflecting anticipation, skepticism, and humor. Many are eagerly refreshing the page in hopes of seeing their reviews, despite acknowledging the likelihood of AI-written responses. There's criticism of the review process, with some questioning the quality of reviews and others mocking rushed last-minute evaluations. A sense of solidarity is also apparent, highlighting the shared struggle of waiting and wishing each other luck. Frustrations about dataset selection for AI research and technical issues with the website are also expressed. The overall sentiment leans towards camaraderie in the face of an imperfect review process.
792 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/BetterbeBattery11/12/2025
176

[D] <ICLR review comment> Is this real?

Research
The discussions revolve around a controversial paper review and the author's response. The majority opinion, represented by the most upvoted comment, confirms the review's authenticity. Many criticize the paper's presentation and formatting issues, citing unprofessionalism in structure and design. Despite these criticisms, there is also agreement that the paper's scientific merit doesn't warrant an extremely low score. The author's response, seen as inappropriate, drew additional criticism. Some commenters suggest that the paper should be rejected, and there's a proposal for a ban. Overall, the sentiment is negative towards both the paper and the author's reaction.
25 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/muttibaaz11/14/2025
87

streaming telemetry from 500+ factory machines to cloud in real time, lessons from 2 years running this setup

Discussion
The team built a real-time monitoring system for over 500 factory machines, generating about 2 million data points daily. Initial attempts at implementing this system using MQTT brokers and Kafka clusters proved unsuccessful due to scalability and management issues. The solution was a simplified system that prioritized reliable data collection and could handle network failures, using messaging software on inexpensive hardware at each factory. This edge-first approach, where devices work independently and sync when possible, proved effective. However, the post faced criticism for its lack of clarity and detailed explanation of the solution implemented, with users asking for more specific information.
27 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/Awkward_Affect_194111/14/2025
78

Hi I just want to know where I can practice sql with a real database?

SQL Server
The community recommends several resources for practicing SQL with a real database. Microsoft's Adventureworks database is a popular choice due to its wide use in practice exercises. Kaggle, Data.gov, and SandboxSQL (SQLite) are also useful for accessing real datasets. Other options include TCPDS via shell.duckdb.org and Google's Big Query for easy access to public datasets. Northwind, another Microsoft database, is also recommended. The W3 School's SQL test database is suggested for beginners. Users can also modify these databases to suit their needs, demonstrating the flexibility and adaptability of these resources.
37 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/IllustratorSalty975311/10/2025
62

best database software

Discussion
The consensus among Reddit users is that Postgres is a highly recommended database software due to its reliability, scalability, and balance of performance and simplicity. It's considered the default choice unless there's a specific reason to use another. SQL Server and MySQL also receive mentions, but there is caution against Oracle and MySQL over concerns regarding data integrity and silent failure. SQLite is deemed not suitable for web scale projects or multiple connections, although it's suitable for single-user systems. The importance of matching the database solution to the project's expected scale was also highlighted.
44 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/Koch-Guepard11/13/2025
43

What is the best SQL Studio ?

PostgreSQL
A discussion regarding the best SQL Studios revealed a preference for options such as DBeaver, SSMS, SQL Developer, and DataGrip, with some users also mentioning VSCode and TOAD. DataGrip by Jetbrains was particularly praised and its recent availability for free personal use was highlighted. VSCode's versatility was appreciated, while DBeaver was recommended for use with different RDBMS. The command line interface and Duckdb were also mentioned as user-friendly for some. The general sentiment was positive, emphasizing the range of available tools to suit individual preferences and needs, with no clear consensus on a single best SQL Studio.
60 comments
Share
Save
View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

Get the weekly data subTLDR in your inbox!

We respect your privacy. No spam, ever.