data subTLDR week 21 year 2026

r/MachineLearningr/dataengineeringr/SQL

Mastering Window-Function Tricks, Surviving Tough Interviews, Identifying SQL Red Flags, Debating Data Engineering Excitement, Navigating Career Stagnation

May 24, 2026•Week 21, 2026

Posted in r/MachineLearningbyu/NielsRogge•5/18/2026

357

Reviving PapersWithCode (by Hugging Face) [P]

Research

The revival of the PapersWithCode website by Hugging Face is highly appreciated by users, particularly academics and researchers who rely on it for updates on models, datasets, and methodologies. The new site features trending papers, domain categorization, eval results for high-impact papers, automated linking of Github and other project URLs, and more. Users have requested features such as the ability to flag misclassified papers and the addition of community benchmarks for more consistency. Some users also offered their assistance to improve the platform, emphasizing the value of the site for the research community. Overall, the sentiment is highly positive.

33 comments

Save

View on Reddit →

Posted in r/SQLbyu/FixelSmith•5/19/2026

146

Eight window-function tricks beyond LAG and ROW_NUMBER

Discussion

The Reddit community appreciates the content on window-function tricks, finding it insightful and well-organized. There were requests for examples of the RANGE syntax and suggestions to mention platform compatibility for each function. The author clarified the syntax and acknowledged the need for more consistent coverage of platform compatibility. The term Gap and Island, although initially thought to be AI-generated, is a long-standing SQL term. The post was also praised for not including too much warehouse-specific information, which could have been confusing. A few users mentioned they would apply these tricks in their work, particularly with Snowflake. The overall sentiment is positive.

16 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/NielsRogge•5/24/2026

145

PapersWithCode new features - week 1 [P]

Project

The open-source team at Hugging Face has revived and updated the site PapersWithCode. New features include support for multiple metrics for each benchmark, external papers beyond Arxiv, paper lineage, and new methods based on popularity. Users can now screenshot leaderboards for easy sharing, and many more evals have been added. User feedback is positive, with requests for more interactivity such as a paper claiming mechanism for users to edit information about their own papers, backend database sharing, and adding additional filters for metrics. Some users miss domains from the old site and suggest adding more, like recommendation systems.

9 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/SeveralCherry7350•5/18/2026

138

Data Engineering is boring!

Discussion

The sentiment towards data engineering being boring seems to be mixed. Some engineers seem to feel the role has lost its excitement due to the automation of tasks by AI, making the job feel like a 'tool and jargon graveyard.' However, others see this as an opportunity for growth and diversification into areas such as AI, infra management, and back-end/front-end development. Moreover, the monetary benefits of the role are also recognized. The consensus is that while the work can be mundane, it's still rewarding in terms of employment stability and income, and there's potential for those with a proactive and curious approach.

87 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/Ok_Illustrator_816•5/20/2026

137

DE feels like a dead end beyond 4 years at the same company

Discussion

Many data engineers feel stagnant after several years at the same company, particularly when their work becomes fully automated and new projects are infrequent. While some appreciate the downtime for personal projects or hobbies, others struggle with feeling stuck, especially when their experience isn't recognized in the job market due to a lack of specific tools like Databricks or Snowflake on their resume. Some suggest earning relevant certifications or simply adding these skills to their resume after self-study, while others caution against skill deterioration during periods of low work demand. Overall, the sentiment is mixed with varying strategies to navigate this common issue.

55 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/Acinac•5/20/2026

116

VP told me to 'just use Cowork' to fix years of data chaos in a month. I am losing my mind.

Rant

The data engineer at a large conglomerate is facing pressure to use an AI tool, Claude Cowork, to quickly organize years of chaotic, unstandardized data from different sources to meet an AI-driven revenue goal. The task is complex due to lack of data management systems, inconsistent naming conventions, duplicate IDs, and lack of linkage between original work orders and test tasks. The majority of Reddit commenters, weighted by upvotes, suggest complying with the request, expecting poor results to illustrate the problem. Others advise finding a new job, creating a reference dataset for Claude Cowork, or focusing on high-leverage data cleanup. The overall sentiment is negative, indicating a lack of confidence in achieving clean, usable data in a month.

28 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/NutInBobby•5/20/2026

109

OpenAI claims a general-purpose reasoning model found a counterexample to Erdos's unit-distance bound [D]

Discussion

OpenAI's claim that its general-purpose reasoning model discovered a counterexample to Erdos's unit-distance bound has sparked a variety of reactions. Some commenters view this as exceptional evidence of AI's potential in autonomous research, while others question the novelty and reproducibility of the finding. A significant point of discussion was the AI's capacity for generating and evaluating counterexamples, which could advance mathematical research. The validity of the proof was largely accepted due to its verification by recognized experts. However, some called for more transparency regarding the AI's working process, including its prompt and the human mathematicians' involvement. The overall sentiment was mixed but leaned towards positive.

37 comments

Save

View on Reddit →

Posted in r/SQLbyu/TraumaBondage•5/22/2026

Pretty sure I just blew the biggest interview of my life. AMA!

SQL Server

The consensus among commenters is that relying on Google for technical queries during the interview process is a realistic and acceptable practice. Many argued that employers expecting candidates to know all answers without assistance may not provide a conducive work environment. The discussion also highlighted the importance of understanding the human aspect of technical issues, such as context and the user's perspective before diving into problem-solving. There was also a general sentiment of empathy towards the original poster's experience, acknowledging the current challenging job market and the pressure to stand out among numerous candidates.

44 comments

Save

View on Reddit →

Posted in r/SQLbyu/badboyzpwns•5/20/2026

What are common SQL red flags?

PostgreSQL

SQL writing red flags commonly identified include using vague aliases for joins, such as 'a b c' instead of abbreviations of table names for readability. Good proofreading and spelling are also valued. There's a strong emphasis on code formatting for readability, with poorly formatted code often associated with weak skills or inexperience. Other issues mentioned include inconsistent use of upper and lower case keywords, non-descriptive common table expression (CTE) aliases, preference for specifying 'INNER JOIN' or 'LEFT OUTER JOIN' instead of just 'JOIN', fixing duplicate rows with 'select distinct' instead of proper joins, and the use of 'RIGHT JOINS'. Overall, clarity and readability seem paramount.

186 comments

Save

View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

data subTLDR week 21 year 2026

Subscribe to data-subtldr

Get the weekly data subTLDR in your inbox!