data subTLDR week 8 year 2026

r/MachineLearningr/dataengineeringr/SQL

Mastering SQL: Tips from Veterans, Solutions for Non-SQL Professionals, Best Practice Websites for Interview Prep, Excitement Over 'Designing Data-Intensive Applications' 2nd Edition, Debating Data Lakes' Efficacy

February 22, 2026•Week 8, 2026

Posted in r/dataengineeringbyu/sspaeti•2/18/2026

945

Designing Data-Intensive Applications - 2nd Edition out next week

Blog

The second edition of Designing Data-Intensive Applications has sparked enthusiasm among readers, despite some admitting they may not read it cover to cover. The book, widely appreciated for its breakdown of data handling at various scales, is seen as a valuable resource for understanding the logic behind big data management tools and strategies, and for designing reliable and predictable software with an enterprise mindset. Some readers suggest using it as a reference, focusing on relevant chapters rather than attempting a full read-through. However, others admit the book can be overwhelming due to its dense content.

108 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/wtfzambo•2/17/2026

439

In 6 years, I've never seen a data lake used properly

Discussion

Many professionals express dissatisfaction with the implementation of data lakes in their organizations, citing increased infrastructure costs and a lack of practical value. Critics argue that the concept of a data lake, while appealing in theory, often leads to poorly managed and ineffective data storage systems. However, some see value in using a combination of data lake and data warehouse strategies, highlighting the utility of data lakes as a flexible, cost-effective landing zone for raw data. There is also an acknowledgment that the challenges with data lakes are more behavioral than technological, pointing to the need for better data management practices. Overall, the sentiment is mixed, with an emphasis on the need for careful evaluation before deciding on a data lake strategy.

227 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/NoAdministration6906•2/18/2026

258

[D] We tested the same INT8 model on 5 Snapdragon chipsets. Accuracy ranged from 93% to 71%. Same weights, same ONNX file.

Discussion

On-device accuracy testing of an INT8 model across multiple Snapdragon chipsets revealed significant performance differences, with accuracy ranging from 71% to 93%. This discrepancy is attributed to varying NPU precision handling, operator fusion differences, and memory-constrained fallback in different chipsets. Notably, these issues do not appear in cloud-based benchmarks and are only visible when running on real hardware. The discussion highlighted the need for real hardware integration into the CI pipeline, deployment-aware model training, and a closer examination of INT8 rounding behavior. The overall sentiment warns against assuming uniform hardware implementation and underestimating the variability in on-device performance compared to cloud benchmarks.

34 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/hcarlens•2/19/2026

198

[R] Analysis of 350+ ML competitions in 2025

Research

Machine learning competitions in 2025 saw a shift in the tools and methods used. Gradient-boosted decision trees remained popular for tabular data, but AutoML packages and Tabular foundation models (TabPFN) saw increased use. Language/reasoning competitions favored Qwen2.5 and Qwen3 models over BERT-style models. For the first time, Transformer-based models outperformed CNN-based ones in more vision competitions. In audio contests involving human speech, OpenAI’s Whisper model was commonly fine-tuned. PyTorch was the dominant deep learning tool, with 20% of these solutions also using PyTorch Lightning. Despite its growing popularity in the data engineering community, Polars saw low uptake among competition winners.

8 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/daxdaxy•2/18/2026

181

Microsoft UI betrayal

Meme

Microsoft's UI, particularly Azure Data Factory (ADF), has been met with notable criticism. Users expressed frustration, calling it poorly designed and not user-friendly, with some even expressing regret in using the technology. Complaints ranged from tables filled with NULL values to issues with SSIS UI, another Microsoft product, also deemed as subpar. Some users, however, pointed out a few redeeming qualities such as the quick 'copy activity' feature in ADF. Overall, the sentiment is highly negative, with users desiring more user-centric design in Microsoft's tools.

24 comments

Save

View on Reddit →

Posted in r/SQLbyu/arrogant_definition•2/19/2026

Dealing with professionals who don’t know SQL but need it.

PostgreSQL

The concern of professionals not knowing SQL, yet needing it, sparked a discussion about potential solutions. The most upvoted suggestions included building basic reports or using Excel queries, as well as using tools like Metabase, an open-source tool that connects to Postgres and offers a visual query builder. Another recommendation was TalkBI, an AI tool that simplifies data pulling and visualization. Some suggested PowerBI or Excel training for the team, while others proposed hiring a data or BI engineer. The overall sentiment was positive, with users providing various practical solutions.

40 comments

Save

View on Reddit →

Posted in r/SQLbyu/katokk•2/16/2026

Best websites to practice SQL to prep for technical interviews?

Discussion

DataLemur, StrataScratch, and Mode Analytics SQL Tutorial are popular platforms for SQL practice, especially for mid-level analyst roles. Window functions like lag/lead, row_number, and dense_rank are frequently encountered in interviews, so comfort with these concepts is essential. Practicing writing queries from scratch on a blank dataset, not just solving predefined problems, is also recommended. Additional platforms mentioned include tailoredsim.com, FreeSQL.com, and skillsql.com. Some users also suggest downloading personal data for practice or utilizing public datasets. The overall sentiment is positive, with users sharing various resources and strategies for SQL interview preparation.

15 comments

Save

View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

data subTLDR week 8 year 2026

Designing Data-Intensive Applications - 2nd Edition out next week

In 6 years, I've never seen a data lake used properly

[D] We tested the same INT8 model on 5 Snapdragon chipsets. Accuracy ranged from 93% to 71%. Same weights, same ONNX file.

[R] Analysis of 350+ ML competitions in 2025

Microsoft UI betrayal

Dealing with professionals who don’t know SQL but need it.

Best websites to practice SQL to prep for technical interviews?

Subscribe to data-subtldr

Get the weekly data subTLDR in your inbox!