data subTLDR week 22 year 2025

r/MachineLearningr/dataengineeringr/SQL

Engaging SQL Learning Through Free Games, Exploring SQL Experience Expectations, Team Chaos vs. SQL Libraries, Salesforce's $8B Informatica Acquisition, and Mistakenly Nuking Dashboards: Lessons Learned

June 1, 2025•Week 22, 2025

Posted in r/dataengineeringbyu/putt_stuff98•5/27/2025

425

Salesforce agrees to buy Informatica for 8 billion

Discussion

Salesforce's agreement to buy Informatica for $8 billion has garnered mixed reactions, with a lean towards skepticism. Many users, critical of Informatica's product quality and business practices, predict a future decline for the company under Salesforce's ownership, similar to what they believe happened with Tableau. Critics argue Informatica is kept alive mainly by its sales team and legacy implementations, rather than its product superiority. Some users see the acquisition as a strategic move by Salesforce to compete with data-centric companies like Databricks and Snowflake, but concerns about lack of synergy and Salesforce's scattered focus persist.

190 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/SocioGrab743•5/27/2025

387

I just nuked all our dashboards

Help

The individual, who is not a data engineer, was left in charge of dashboards and made a critical error by dropping tables in BigQuery after office hours, causing all dashboards to shut down. Although he managed to restore everything in about 20 minutes, he's overwhelmed and uncertain about the repercussions. Top comments emphasize that dropping tables without complete human sign-off is a critical mistake, especially without understanding downstream dependencies. They also caution against renaming tables or columns unless the downstream data pipelines, dashboards, and integrations are well-understood. The overall sentiment is mixed, with empathetic advice and stern warnings.

154 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/betonaren•5/26/2025

331

scrum is total joke in DE & BI development

Discussion

There is a consensus that Scrum, as implemented in data engineering and business intelligence development, often fails due to its incompatibility with the unpredictable and exploratory nature of these fields. Many developers feel overcommitted and struggle to provide accurate time estimates for tasks. However, several users propose solutions such as breaking tasks into smaller, more manageable pieces, and using a structured, logical approach to exploratory work. Others suggest adopting a Kanban-style workflow, which provides clarity on bottlenecks and focuses on single tasks at a time. Despite criticism, some argue that failures in Scrum implementation are often due to a lack of correct application rather than inherent flaws in the methodology itself. Overall sentiment leans towards negative with suggestions for improvement.

117 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/Specialist_Square818•5/27/2025

330

[R] Bloat in machine learning shared libs is >70%

Research

The award-winning paper The Hidden Bloat in Machine Learning Systems presents Negativa-ML, an open-source tool that significantly reduces device and host code sizes in machine learning frameworks, bringing down peak memory usage and execution time. This breakthrough underscores the idea that device code is a main source of bloat within these frameworks. The community response is largely supportive, with some noting that the bloat is not surprising given that many machine learning engineers are not deeply versed in GPU programming. Others suggest that the tool's performance measurements may be flawed, and recommend setting different configurations for more accurate results. The sentiment is generally positive, with users sharing their experiences of dealing with large, inefficient codebases and expressing optimism for continued improvements in debloating ML systems.

15 comments

Save

View on Reddit →

Posted in r/SQLbyu/chrisBhappy•5/29/2025

329

I put together a list of 5 free games to practice SQL

MySQL

The post sharing a list of free browser-based games for practicing SQL, including the recently launched SQLNoir, received overall positive feedback. Users appreciated the fun and investigative approach to learn SQL, with anticipation for increasing complexity in the games. The games have reportedly been used in an educational setting with success. However, some users encountered technical issues, including unsecured connection warnings. Despite this, the sentiment is generally positive and users are eager to explore these resources further.

22 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/Dev-Table•6/1/2025

265

[P] Interactive Pytorch visualization package that works in notebooks with 1 line of code

Project

The open-source package 'torchvista' is gaining positive attention for its interactive Pytorch model visualization capabilities in web-based notebooks. It offers unique features like modular exploration, clear tensor shape views, error tolerance, and notebook support. Users appreciated its ability to render partial graphs during model failure for debugging purposes. It is compatible with newer Pytorch models including Transformers and MAmba architectures. Some users expressed hope for further development and extensibility, particularly for deep supervision or mechanical interpretation research projects. Overall, the sentiment was highly positive, with users seeing it as a useful tool for their work.

23 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/nickfox•5/26/2025

216

[D] Grok 3's Think mode consistently identifies as Claude 3.5 Sonnet

Discussion

The AI community debates an unusual behavior in xAI's Grok 3, where it identifies as Claude 3.5 Sonnet when in Think mode. A consensus emerged attributing this to Grok possibly using a significant amount of Claude's output as training data, indicating the Grok pretraining team might have overlooked crucial filtering. Some users tested and verified this claim. Others suggested that language learning models (LLMs) have historically struggled with self-identification, arguing that this is not a new or unique issue. A minority expressed cynicism, suggesting Grok could simply be a rebranded model. The sentiment is mixed, with a call for more reliable AI identity management.

50 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/Radiant_Situation340•5/30/2025

198

[R] The Resurrection of the ReLU

Research

A new preprint focuses on the resurrection of the ReLU activation function in neural networks, introducing a method called SUGAR (Surrogate Gradient Learning for ReLU). This solution replaces the derivative with a smooth surrogate gradient, improving convergence and generalization, and has garnered consistent accuracy gains. Despite questions about why not use the function with the replaced gradient in the first place, replies clarified that networks generalize better with the proposed method. Concerns were raised about the impact on training speed, due to computational simplicity being a major benefit of ReLU. Some users also expressed interest in testing the method in their work. Overall, the sentiment is mixed, with excitement for potential improvements and skepticism about practical implications.

55 comments

Save

View on Reddit →

Posted in r/SQLbyu/PortalRat90•5/27/2025

165

What is SQL experience?

SQL Server

SQL experience varies widely, with specific expectations depending on the job role. For example, a Data Engineer would require DDL experience, while a Data Analyst would need DML and Window Functions knowledge. Key skills include understanding different types of joins, where and having clauses, and the use of CTEs and subqueries. Mastery of these basic concepts is often expected, even for entry-level roles. However, even experts admit that SQL has a vast array of platform-specific features that are rarely used or known. The overall sentiment is one of encouragement to apply for roles despite potential knowledge gaps, with continuous learning being an important aspect of SQL proficiency.

80 comments

Save

View on Reddit →

Posted in r/SQLbyu/jspectre79•5/30/2025

104

Does your team have a SQL library… or just chaos?

Discussion

The discussion reveals a prevailing issue in many teams with SQL queries, characterized by a lack of organization and documentation, leading to inefficiency. Primarily, several participants use Git repositories to manage and document their SQL. Others suggest using stored procedures and structuring data instead of sharing code. The sentiment leans towards frustration with the current state of affairs and a desire for more structured systems. However, there are varying views on the best solution, with some advocating for a SQL library and others emphasizing the importance of transforming broadly accepted SQL into tables or views.

94 comments

Save

View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

data subTLDR week 22 year 2025

Subscribe to data-subtldr

Get the weekly data subTLDR in your inbox!