← Back to data subTLDR
data subTLDR week 8 year 2026
r/MachineLearningr/dataengineeringr/SQL
Mastering SQL: Tips from Veterans, Solutions for Non-SQL Professionals, Best Practice Websites for Interview Prep, Excitement Over 'Designing Data-Intensive Applications' 2nd Edition, Debating Data Lakes' Efficacy
•Week 8, 2026
Posted in r/dataengineeringbyu/sspaeti•2/18/2026
945
Designing Data-Intensive Applications - 2nd Edition out next week
Blog
The second edition of Designing Data-Intensive Applications has sparked enthusiasm among readers, despite some admitting they may not read it cover to cover. The book, widely appreciated for its breakdown of data handling at various scales, is seen as a valuable resource for understanding the logic behind big data management tools and strategies, and for designing reliable and predictable software with an enterprise mindset. Some readers suggest using it as a reference, focusing on relevant chapters rather than attempting a full read-through. However, others admit the book can be overwhelming due to its dense content.
Posted in r/dataengineeringbyu/wtfzambo•2/17/2026
439
In 6 years, I've never seen a data lake used properly
Discussion
Many professionals express dissatisfaction with the implementation of data lakes in their organizations, citing increased infrastructure costs and a lack of practical value. Critics argue that the concept of a data lake, while appealing in theory, often leads to poorly managed and ineffective data storage systems. However, some see value in using a combination of data lake and data warehouse strategies, highlighting the utility of data lakes as a flexible, cost-effective landing zone for raw data. There is also an acknowledgment that the challenges with data lakes are more behavioral than technological, pointing to the need for better data management practices. Overall, the sentiment is mixed, with an emphasis on the need for careful evaluation before deciding on a data lake strategy.
Posted in r/MachineLearningbyu/NoAdministration6906•2/18/2026
258
[D] We tested the same INT8 model on 5 Snapdragon chipsets. Accuracy ranged from 93% to 71%. Same weights, same ONNX file.
Discussion
On-device accuracy testing of an INT8 model across multiple Snapdragon chipsets revealed significant performance differences, with accuracy ranging from 71% to 93%. This discrepancy is attributed to varying NPU precision handling, operator fusion differences, and memory-constrained fallback in different chipsets. Notably, these issues do not appear in cloud-based benchmarks and are only visible when running on real hardware. The discussion highlighted the need for real hardware integration into the CI pipeline, deployment-aware model training, and a closer examination of INT8 rounding behavior. The overall sentiment warns against assuming uniform hardware implementation and underestimating the variability in on-device performance compared to cloud benchmarks.
Posted in r/MachineLearningbyu/hcarlens•2/19/2026
198
[R] Analysis of 350+ ML competitions in 2025
Research
Machine learning competitions in 2025 saw a shift in the tools and methods used. Gradient-boosted decision trees remained popular for tabular data, but AutoML packages and Tabular foundation models (TabPFN) saw increased use. Language/reasoning competitions favored Qwen2.5 and Qwen3 models over BERT-style models. For the first time, Transformer-based models outperformed CNN-based ones in more vision competitions. In audio contests involving human speech, OpenAI’s Whisper model was commonly fine-tuned. PyTorch was the dominant deep learning tool, with 20% of these solutions also using PyTorch Lightning. Despite its growing popularity in the data engineering community, Polars saw low uptake among competition winners.
Posted in r/dataengineeringbyu/daxdaxy•2/18/2026
181
Microsoft UI betrayal
Meme
Microsoft's UI, particularly Azure Data Factory (ADF), has been met with notable criticism. Users expressed frustration, calling it poorly designed and not user-friendly, with some even expressing regret in using the technology. Complaints ranged from tables filled with NULL values to issues with SSIS UI, another Microsoft product, also deemed as subpar. Some users, however, pointed out a few redeeming qualities such as the quick 'copy activity' feature in ADF. Overall, the sentiment is highly negative, with users desiring more user-centric design in Microsoft's tools.
Posted in r/SQLbyu/arrogant_definition•2/19/2026
49
Dealing with professionals who don’t know SQL but need it.
PostgreSQL
The concern of professionals not knowing SQL, yet needing it, sparked a discussion about potential solutions. The most upvoted suggestions included building basic reports or using Excel queries, as well as using tools like Metabase, an open-source tool that connects to Postgres and offers a visual query builder. Another recommendation was TalkBI, an AI tool that simplifies data pulling and visualization. Some suggested PowerBI or Excel training for the team, while others proposed hiring a data or BI engineer. The overall sentiment was positive, with users providing various practical solutions.
Posted in r/SQLbyu/katokk•2/16/2026
21
Best websites to practice SQL to prep for technical interviews?
Discussion
DataLemur, StrataScratch, and Mode Analytics SQL Tutorial are popular platforms for SQL practice, especially for mid-level analyst roles. Window functions like lag/lead, row_number, and dense_rank are frequently encountered in interviews, so comfort with these concepts is essential. Practicing writing queries from scratch on a blank dataset, not just solving predefined problems, is also recommended. Additional platforms mentioned include tailoredsim.com, FreeSQL.com, and skillsql.com. Some users also suggest downloading personal data for practice or utilizing public datasets. The overall sentiment is positive, with users sharing various resources and strategies for SQL interview preparation.
Subscribe to data-subtldr
Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.