← Back to data subTLDR

data subTLDR week 12 year 2026

r/MachineLearningr/dataengineeringr/SQL

Discussing a Free SQL Practice Tool's Pros and Cons, FABCON/SQLCON Highlights, YouTube SQL Content Creators Recommendations, Essential Skills for Data Engineers in AI Era, and the Controversial ROI Dollars Trend in Resumes.

Week 12, 2026
Posted in r/MachineLearningbyu/S4M223/18/2026
184

[D] ICML rejects papers of reviewers who used LLMs despite agreeing not to

Discussion
The ICML's decision to reject papers of reviewers who used Language Model machines (LLMs) in spite of agreeing not to has been widely supported by the community. Many agree that it's crucial to maintain integrity, especially in reviewing, to ensure the proper functioning of academia. The detection of LLM use was achieved through a watermark or prompt injection method, which has been praised for its precision and ability to reduce false positives. Some concerns were raised about the potential impact on multi-author papers when one person violates the policy. Overall, the sentiment was strongly in favor of strict measures against academic misconduct.
74 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Benlus3/22/2026
184

[N] MIT Flow Matching and Diffusion Lecture 2026

News
The MIT 2026 course on flow matching and diffusion models, released by Peter Holderrieth and Ezra Erives, is highly appreciated for its comprehensive approach to modern AI image, video, and protein generators. The course's inclusion of discrete diffusion for language models is seen as a standout feature, making it more competitive with text generation. The course is also praised for its practical approach, combining theory, derivations, and hands-on coding. The addition of new topics such as latent spaces and diffusion transformers is well-received. Some commenters expressed interest in forming a study group to follow the lectures and work on the problems together.
19 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/NeighborhoodFatCat3/22/2026
166

[D] Has industry effectively killed off academic machine learning research in 2026?

Discussion
The role of academia in the field of machine learning (ML) is a hot topic. Some suggest that industry has surpassed academia in ML research due to vast resources and international talent, but others argue that despite the shift towards application-focused research in industry labs, academia continues to produce foundational ideas. In fact, the increasing lack of freedom in industry labs may be pulling talent back towards academia. While there's concern about academia focusing on less relevant or impractical topics, some defend its capacity to explore less-traveled paths, which can be more rewarding and meaningful. There's also a recognition of academia's key role in tackling complex problems like formal safety verification of autonomous systems. Overall, the sentiment is mixed, with a clear indication that both industry and academia have significant roles in advancing ML.
63 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/WhiteBear20183/19/2026
123

ICLR 2026 oral with 2 rejects, 1 borderline reject

Discussion
Surprise and frustration are evident among commenters discussing an ICLR paper that received an oral presentation slot despite having two rejects and a borderline reject. The consensus is that Area Chairs (ACs) wield too much power, sometimes overruling reviewers, leading to seemingly inconsistent decisions. This year's ICLR process was also criticized for the lack of a discussion period due to an open review leak. Some expressed empathy for the author, acknowledging the subjectivity of reviews. Despite the controversy, a few commenters highlighted that sacrificing efficiency for theoretical guarantees can be seen as a step forward in research.
20 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/rikulauttia3/16/2026
100

What data engineering skill matters more now because of AI?

Discussion
In the evolving field of data engineering, professionals are emphasizing the increasing importance of soft skills like communication, as well as hard skills like data modeling and system design. They note that making data and lineage understandable to AI tools is critical for maintaining pipelines, and the use of Language Learning Models (LLMs) is an integral part of their work. Additionally, they underscore the necessity of thorough design and code documentation, lineage traceability, and constant improvement of code deficiencies. The general sentiment is a blend of adaptation and growing necessity for skills that were previously considered secondary or niche in this field.
39 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/BeautifulLife3603/16/2026
89

Unpopular opinion: The trend of having ROI dollars has ruined résumés.

Rant
Criticism surrounds the trend of listing ROI dollars on résumés, with a majority considering it misleading and unhelpful in assessing candidate abilities. Many believe these figures are easily manipulated and impossible to verify, leading to skepticism. However, some understand why candidates might use this approach, as it's a shorthand way to demonstrate impact in the brief review time recruiters typically give. A few also attribute the trend to tech giants and note their hesitance to hire employees from such companies. Despite criticism, some defend the practice, stating they began receiving recruiter attention only after including ROI numbers on their résumés. Overall sentiment: mixed.
43 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/empty_cities3/16/2026
81

Is AI making you more productive in Data Engineering?

Discussion
AI is enhancing productivity in data engineering, enabling the creation of unique tools for streamlined work. However, many users express mixed sentiment due to increased workload and reduced headcount, seemingly negating the productivity gains. AI's benefits aren't always felt by the workers, with some lamenting the absence of time to learn from tasks completed by AI. There's also concern over the resultant bad practices and anti-patterns due to over-reliance on AI. Meanwhile, some find value in AI in smaller teams, where it aids in improving productivity and decision-making. Despite the advantages, there's a clear call for mindful implementation.
61 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/CriticalofReviewer23/22/2026
49

I built a machine learning model using only SQL (no ML libraries, no Python)

BigQuery
A developer has successfully constructed a machine learning model using only SQL, with no need for ML libraries or Python. This unusual approach was initially conceived for low-resource environments and runs the entire pipeline in a single query. The model seems to operate well within SQL's constraints, relying on aggregations rather than optimization loops, functioning more like a GROUP BY job than a transactional workload. This innovative, though unconventional, technique has sparked interest and admiration, along with some skepticism regarding potential deadlocks. Despite potential doubts, the overall sentiment leans towards impressed and curious.
10 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/ATastefulCrossJoin3/18/2026
34

Reporting in from FABCON / SQLCON - any knowers?

Discussion
There's high anticipation for the features of SQL Server 2025, with lively discussion around the software's history and evolution. A shared understanding emerged that SQL Server has come a long way since its inception, with some participants reflecting on their experiences using older versions. There were suggestions for improvements, such as incorporating LIMIT into the T-SQL grammar. However, there were also humorous comments about the past, reflecting a positive and nostalgic sentiment overall. A minority took issue with specific aspects, leading to a slightly mixed but mainly positive sentiment.
29 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/bigjeanz3/17/2026
20

Any recommendations for YouTube specifically content creators on SQL.

MySQL
Brent Ozar is a highly recommended resource for SQL learning on YouTube, with multiple comments praising his content. Erik Darling is another channel suggested for SQL knowledge. Database Star is mentioned as being helpful for interview preparation, while Toufiq Tech and Practically Perfect PL/SQL with Steven Feuerstein are also recommended. One user mentioned 'SQL Server Radio' as an enjoyable podcast for SQL content. Remember, these suggestions are based on personal preferences and learning needs, so it's advisable to check them out to see which suits your style.
18 comments
Share
Save
View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

Get the weekly data subTLDR in your inbox!

We respect your privacy. No spam, ever.