← Back to data subTLDR

data subTLDR week 45 year 2025

r/MachineLearningr/dataengineeringr/SQL

SQL's 'Group by all' Feature Sparks Debate, Normal Forms (1NF, 2NF, 3NF) Vex Users, Data Quality: Product Management or Engineering Responsibility?, When ETL Jobs Morph into Critical Systems

Week 45, 2025
Posted in r/MachineLearningbyu/Alieniity11/4/2025
283

[R] Knowledge Graph Traversal With LLMs And Algorithms

Research
The research project on Knowledge Graph Traversal with LLMs and Algorithms sparked a constructive discussion about the correct terminologies and the underlying assumptions in the research. The top comment pointed out that the project is more accurately a 'semantic similarity graph' rather than a 'knowledge graph' as knowledge graphs contain structured facts, not just unstructured text. Another comment urged the author to clarify assumptions like document chunking, and to consider how the system could accommodate updates, a characteristic of knowledge graphs. The author acknowledged these insights and expressed willingness to make revisions for accuracy. Suggestions for further learning resources and possible improvements were also shared. Overall, the sentiment was positive, with a focus on improving the accuracy and understanding of the research.
24 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Fair-Rain336611/5/2025
198

Reasoning models don't degrade gracefully - they hit a complexity cliff and collapse entirely [Research Analysis] [R]

Research
A recent analysis of 18 papers on reasoning model limitations has revealed concerning results. These models perform well up to a certain complexity threshold, then suddenly collapse. In tasks requiring both math and commonsense reasoning, accuracy drops significantly, indicating that these models don't combine capabilities but fragment them. This discussion has generated various views, including that Language Reasoning Models (LRMs) are function approximations, and their performance decreases as complexity and the need for symbolic work increase. Some argue this is due to a knowledge gap issue, while others think experience allows us to encode recurring patterns more efficiently. The overall sentiment is mixed, with many seeking a deeper understanding of these models' limitations.
38 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/NewLog496711/6/2025
189

Unpopular Opinion: Data Quality is a product management problem, not an engineering one.

Discussion
The thread discusses the idea that data quality should be a product management issue, not a data engineering one. The majority of participants agree, arguing that data engineers often clean up after changes in business logic without prior notice. They suggest data quality should be part of the initial product criteria, with engineers refusing to build pipelines until quality expectations are signed off. Criticism is directed at both data engineers and product managers for not understanding the impacts of data quality issues or taking proactive measures. Some argue that data quality problems arise from different departments collecting data in incompatible ways. Overall, the sentiment leans towards the need for more upstream accountability for data quality.
55 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/stephen821243811/5/2025
176

When the pipeline stops being “a pipeline” and becomes “the system”

Career
Many professionals resonate with the experience of a temporary ETL (Extract, Transform, Load) job evolving into a critical system, likening it to Facebook's `dim_all_users` and other complex data handling systems. This transformation, while challenging and often unexpected, is generally perceived as a sign of the system's value. However, it increases pressure on maintenance and can lead to significant tech debt. The change often results in complexities like adapting to multiple customers and creating real-time monitoring systems. The sentiment is mixed, with some viewing it as an inevitable result of growth and others expressing frustration over the lack of sustainable design from the onset.
20 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/shanksfk11/5/2025
159

Is work life balance in data engineering is non-existent?

Career
Data engineers are grappling with work-life balance issues, experiencing constant pressure and an endless flow of tasks. However, many believe that this balance is largely self-determined, advising to set boundaries and sign off when work is done. This approach isn't met with resistance usually, and if it is, it may be a sign to consider a job change. The culture and expectations of employers can greatly influence work-life balance, with some sectors like insurance offering more stability and less pressure. Personal factors such as anxiety and difficulty setting boundaries can also contribute to poor work-life balance. Overall sentiment is mixed, with some accepting the demanding nature of the field and others advocating for balance and self-care.
89 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Large-Status235211/3/2025
97

[R] We were wrong about SNNs. The bo.ttleneck isn't binary/sparsity, it's frequency.

Research
The performance gap between Spiking Neural Networks (SNNs) and Artificial Neural Networks (ANNs) is not due to information loss from binary activations, as previously believed, but from the low-pass filtering of spiking neurons. This causes high-frequency components to dissipate quickly, reducing feature representation effectiveness. Adjusting this 'astigmatism' through lightweight Max-Pool and DWC operation has led to significant improvements in accuracy. The research, which offers a fresh perspective on SNNs' performance bottlenecks, suggests that optimizing SNNs may not be about mimicking ANNs but exploring SNNs' unique properties. The community responded positively to the research, though some suggested clearer definitions and link corrections.
26 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/Hot-Personality-784711/9/2025
89

Snowflake to Databricks Migration?

Discussion
Many Reddit users have noted a trend of companies switching between Databricks and Snowflake, often driven by better deals offered by either platform. The migration direction does not appear to be consistent, with some users observing a move from Databricks to Snowflake, while others have seen the opposite. Opinions suggest the migration might be influenced by the platforms' compatibility with Azure and AWS. Users also noted that large organizations often maintain both platforms as a strategic move. However, there is skepticism about the effectiveness of these migrations, citing recurrent issues due to poor practices that persist despite platform changes.
47 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/MarkusWinand11/5/2025
66

Group by all: A popular, soon-to-be-standard SQL feature

Oracle
The introduction of the 'Group by all' feature, soon to be standard in SQL, has sparked a diverse range of opinions. While some users find the feature extremely helpful for exploratory queries, others emphasize the need for explicitness in production code to avoid unnecessary grouping and ensure efficient database performance. Skepticism arises around the adoption rate of the feature and its ability to maintain SQL's declarativity. Some question why this seemingly obvious feature wasn't implemented sooner, while others note its scarcity across various server platforms. The sentiment is mixed, reflecting both the anticipation and cautiousness of the users.
31 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/CapEasy4162511/7/2025
63

New Orleans

MySQL
This discussion is centered around a unique graffiti sighting in New Orleans. The reaction to the post is minimal and neutral, with one commenter acknowledging the post simply with Indeed. Another comment made a humorous reference to SQL, a programming language. With limited engagement and no clear consensus, the sentiment surrounding this graffiti remains largely undetermined.
2 comments
Share
Save
View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

Get the weekly data subTLDR in your inbox!

We respect your privacy. No spam, ever.