data subTLDR week 4 year 2026

r/MachineLearningr/dataengineeringr/SQL

Unlocking SQL's Potential: An Addictive Learning Journey, Performance Wins with Composite Indexes, A New Terminal-based Client, Management's Debatable Role, and ADHD Challenges in Data Engineering

January 25, 2026•Week 4, 2026

Posted in r/MachineLearningbyu/mgcdot•1/22/2026

352

[D] 100 Hallucinated Citations Found in 51 Accepted Papers at NeurIPS 2025

Discussion

The issue of 'hallucinated citations' i.e., erroneous references, in accepted papers at AI conference NeurIPS 2025 has drawn attention, with 100 instances identified in 51 papers. This represents just over 1% of accepted submissions. Comments suggest some of these may be due to mishandled bibtex entries or the use of AI tools for citation management. One popular citation error involved a misunderstood undergraduate paper wrongly believed to be the origin of the ReLU function. NeurIPS maintains that citation errors do not invalidate the overall content of papers, and stresses the need for more rigorous review processes. The sentiment is mixed, with some expressing concern over the implications for academic integrity.

75 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/Thinker_Assignment•1/21/2026

221

This will work, yes??

Meme

The majority of commenters expressed skepticism about management's direct involvement in hands-on work, suggesting they are more likely to exert pressure on contract workers. Anecdotes about management's attempts to engage in technical tasks hinted at a trend of these efforts resulting in issues, such as a costly bug in a re-written notebook. A minority of respondents, however, indicated that they themselves, presumably part of management, do engage directly in tasks. The overall sentiment leaned towards a negative view of management's role in operational tasks.

8 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/ThatAi_guy•1/20/2026

214

[P] I Gave Claude Code 9.5 Years of Health Data to Help Manage My Thyroid Disease

Project

The user successfully used 9.5 years of data from their Apple Watch and Whoop to create a machine learning model that detects phases of their episodic Graves' disease with 98% accuracy. The model, implemented in an iOS app, alerts the user weeks before symptoms occur. The community appreciated the initiative but raised concerns about data privacy and overfitting. Some questioned the high accuracy rate, suggesting it could be due to data leakage or the model predicting a lack of episodes. The user clarified that the data was handled locally, and the model was trained considering the episodic nature of the disease.

49 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/ParticularWork8424•1/25/2026

169

[D] ICML 2026 - ICML desk-rejected my paper but kept me on as a reviewer. Wow?

Discussion

The sentiment surrounding the ICML conference's practice of desk-rejecting papers while retaining the authors as reviewers is largely negative, reflecting frustration with the academic system. Many see this as a reflection of how academia exploits unpaid labor, with one comment highlighting that the system's reliance on invisible labor is problematic. Some suggest outright refusal to review in response, while others point out the impersonal nature of the process, explaining that desk rejection often comes down to scope or formatting, not reviewer selection. Despite this, there's a call for more transparency from conferences to mitigate personal offense when such instances occur.

60 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/psgpyc•1/19/2026

150

Any data engineers here with ADHD? What do you struggle with the most?

Help

Data engineers with ADHD shared their struggles with aspects like forgetting config details, getting overwhelmed by small task lists, constantly seeking validation, forgetting tools if not used regularly, and struggling with context switching. They identified meetings and changing requirements as major challenges. Agile is seen as problematic when misused, creating tech debt and burdening engineers. Some solutions include using tools like to-do managers for focus management, planning the week/day, and cognitive therapy. Having a data analyst as part of the team and open communication about their ADHD with colleagues were also suggested for better coping. The overall sentiment is mixed, with individuals seeking and sharing coping strategies.

82 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/HiddenStanLeeCameo•1/20/2026

139

Spending >70% of my time not coding/building - is this the norm at big corps?

Discussion

The majority of data engineers in large corporations spend more time on administrative tasks than actual coding, according to a discussion among professionals. Comparisons were made to lawyers, surgeons, and architects who also don't spend most of their time on their primary skill. The larger the organization, the more red tape and bureaucracy, resulting in more non-technical work. However, some find this norm unfulfilling and prefer consultancy roles. It was suggested that the transition from Data Engineer to Solution Engineer, which often involves more administrative duties, can be lucrative. Overall, the sentiment was mixed, with acceptance of the status quo and dissatisfaction both present.

36 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/Training-Adeptness57•1/19/2026

117

[R] Is Leetcode still relevant for research scientist interviews?

Research

Leetcode remains a significant factor in research scientist interviews, especially at major tech companies like Meta and DeepMind. The focus isn't necessarily on solving a Leetcode Medium in 30 minutes, but rather on demonstrating clean code and thoughtful problem-solving skills. However, candidates with impressive profiles may still face rejection if they cannot solve a medium difficulty problem in a reasonable timeframe. The process varies between roles and companies, with some requiring a Leetcode-style codesignal assessment followed by a live coding session. Startups and research divisions in non-tech sectors may not emphasize Leetcode as much. The sentiment is mixed, with some expressing frustration at the continued reliance on these assessments.

46 comments

Save

View on Reddit →

Posted in r/SQLbyu/waitthissucks•1/22/2026

I think I might be addicted to learning SQL?

Discussion

The beginner's journey in learning SQL is characterized by periods of frustration and clarity, with overnight rest often leading to breakthroughs. This roller coaster experience is normal and even beneficial, as the brain processes information in the background. The usefulness of SQL in various job roles, especially data analysis, was emphasized, with one user sharing a career-transformative experience. Messy, real-world data points were highlighted as a challenge often overlooked in self-learning tools. Users suggested exploring Docker, PostgreSQL, MySQL, and MariaDB, and understanding relational database design fundamentals. The need for better data input systems to manage errors and inconsistencies was also discussed. The overall sentiment was positive, encouraging continuous learning.

25 comments

Save

View on Reddit →

Posted in r/SQLbyu/dataSommelier•1/21/2026

Performance Win: If you filter on multiple columns, check out composite indexes. We just cut a query from 8s to 2ms.

PostgreSQL

The main discussion revolves around the use of composite indexes in SQL to improve query speed in large tables, with the original post sharing a successful experience of reducing query time from 8s to 2ms. In response, a top comment detailed the importance of ordering composite indexes correctly for optimal results. Other comments highlighted the importance of always checking the execution plan for performance optimization and considering other factors like the data distribution and the use of * in select queries. There was a mix of backing up the original post's point and offering additional insights for further optimization, reflecting a positive, instructional tone overall.

15 comments

Save

View on Reddit →

Posted in r/SQLbyu/xGoivo•1/20/2026

I created an SQL database client that runs in the terminal (dbeaver terminalized!) - Pam's Database Drawer [FOSS]

Discussion

The developer of Pam, a terminal-based SQL database client, has released its first beta version. This lightweight, efficient tool supports several databases and allows for reusable queries, interactive table views, and listing of tables and views from a database connection. The tool is written in Go and is open-source. Initial responses to the tool have been positive, with users appreciating its functionality and efficiency. The developer is open to user feedback and suggestions for improvements, particularly regarding user experience and interface enhancements.

2 comments

Save

View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

data subTLDR week 4 year 2026

Subscribe to data-subtldr

Get the weekly data subTLDR in your inbox!