data subTLDR week 3 year 2026

r/MachineLearningr/dataengineeringr/SQL

Unveiling MySQL's Rigidity, SSMS's Enduring Popularity, the Primary Key Predicament, the AI Hiring Hoax, and the AI-Skill Deterioration Debate

January 18, 2026•Week 3, 2026

Posted in r/dataengineeringbyu/Hercules1408•1/12/2026

289

Caught the candidate using AI for screening

Discussion

The use of AI in job screening is leading to a rise in incompetent candidates. Many rely heavily on AI tools, which gives a false sense of competence but can lead to project issues. This is particularly noticeable in remote interviews. RegEx, or regular expression, is a commonly misused tool, with many unable to explain its workings. Furthermore, job descriptions have become increasingly complex, creating unrealistic expectations. There is a call for more discerning hiring practices, as relying on AI can be compared to using a delivery service and claiming to have cooked the meal. The sentiment is predominantly negative towards AI-dependent applicants.

92 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/44th--Hokage•1/15/2026

236

Nvidia: End-to-End Test-Time Training for Long Context aka Being Able To Update A Model's Weights In Real-Time As You Use It | "TTT changes the paradigm from retrieving info to learning it on the fly...the TTT model treats the context window as a dataset & trains itself on it in real-time." [R]

Research

Nvidia's Test-Time Training (TTT) model, which updates its neural weights in real-time, promises to revolutionize how machine learning models process information by separating intelligence from memory. The model treats the context window as a dataset, learning from it on the fly, which enables fast responses with high accuracy and scalability. Discussion centered around the risk of 'catastrophic forgetting' with continual learning, but proponents noted that only some weights are updated, maintaining a static 'safe' copy. However, potential barriers to scaling and concerns over conflating training with inference were raised, highlighting potential engineering challenges. The model's speed, 2.7 times faster than full attention for 128K context, was praised.

20 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/The-CAPtainn•1/16/2026

210

Anyone else losing their touch?

Discussion

The discussion revolves around the increasing reliance on AI in the workplace, particularly in coding and data engineering roles. Many professionals agree that AI boosts productivity but raises concerns about the potential deterioration of their skills due to underuse. Others emphasize the importance of thoroughly reviewing and understanding the AI-generated code, citing instances of significant errors when this was overlooked. Some participants even suggest that relying heavily on AI could lead to job displacement. However, the tone remains mixed as many still value AI as a tool that enhances their work, provided it is used responsibly and in conjunction with continuous learning.

109 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/AhmedMostafa16•1/12/2026

117

[R] Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

Research

Sakana AI's new method, DroPE, extends the context length of pretrained LLMs by challenging a fundamental assumption in Transformer architecture. The top Reddit comments indicate that while RoPE (explicit positional embeddings) are critical for training convergence, they become a bottleneck preventing models from generalizing to longer sequences. The proposed solution is to train with RoPE initially, then drop the RoPE encodings for a few epochs. This strategy reportedly allows models to learn some transferred representation of positional information. However, some users suggest exploring a combination of training with RoPE and NoPE for potentially better results than DroPE. Overall, sentiment towards the method is positive.

23 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/shittyfuckdick•1/13/2026

115

Im Burnt Out

Career

The discussion revolves around the challenges faced by a data engineer (DE) who's suffering from burnout due to increased workload and a problematic change in management that prefers SSIS over Python. Many respondents empathize and share their own distaste for SSIS, asserting that it often leads to more complications. The consensus is that setting boundaries and prioritizing tasks effectively are critical to manage workload and stress. There's a clear sentiment of dissatisfaction with management practices that lead to such burnout. Participants also stress the importance of self-care and assert that no job is worth risking health or personal well-being.

91 comments

Save

View on Reddit →

Posted in r/SQLbyu/zesteee•1/14/2026

Is SSMS still widely used?

Discussion

SSMS remains an industry standard for SQL Server/Azure SQL DB, despite not being considered a standalone skill. It is widely used as a tool to connect and manage RDBMS. The ability of SSMS to connect to Azure DB and Fabric is appreciated, often seen as superior to the UI within these platforms. However, there is a growing trend of professionals also using Visual Studio Code for quicker SQL inquiry. Despite new technologies emerging, SSMS remains a trusted tool, with its functionality continuously optimized and improved. The sentiment leans positive, indicating user comfort and satisfaction with SSMS.

65 comments

Save

View on Reddit →

Posted in r/SQLbyu/Pleasant-Insect136•1/17/2026

There’s no column or even combination of columns that can be considered as a pk, what would your approach be?

Discussion

Intern faced challenges identifying a primary key (pk) in a dataset, as even combinations of columns yielded only 85% distinct results. Many suggested generating a new unique ID column to serve as the pk. Others recommended cleaning the dataset for duplicate entries or normalizing the data. Some advised consulting with experienced coworkers or superiors for a better understanding of the data. The sentiment was largely positive, highlighting the supportive nature of the community in helping solve the problem. There was an emphasis on learning and collaboration, along with the importance of data cleaning and normalization in data management.

55 comments

Save

View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

data subTLDR week 3 year 2026

Caught the candidate using AI for screening

Nvidia: End-to-End Test-Time Training for Long Context aka Being Able To Update A Model's Weights In Real-Time As You Use It | "TTT changes the paradigm from retrieving info to learning it on the fly...the TTT model treats the context window as a dataset & trains itself on it in real-time." [R]

Anyone else losing their touch?

[R] Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

Im Burnt Out

Is SSMS still widely used?

There’s no column or even combination of columns that can be considered as a pk, what would your approach be?

Subscribe to data-subtldr

Get the weekly data subTLDR in your inbox!