← Back to data subTLDR

data subTLDR week 36 year 2025

r/MachineLearningr/dataengineeringr/SQL

Debating a Tough Senior Dev Interview Question, Transitioning from SQL*Plus to MySQL, Seeking PopSQL Alternatives, Discussing Business Teams' Access to BigQuery MCP, Pondering Data Modeling's Decline

Week 36, 2025
Posted in r/dataengineeringbyu/full_arc9/5/2025
496

Giving the biz team access to BigQuery MCP

Meme
There's a clear skepticism towards giving business teams access to BigQuery MCP. Many Reddit users express concerns about potential exploits, comparing MCP implementations to SQL injection attacks but with AI powers. The idea of business teams being proficient with a Git-driven dbt workflow or comfortable using a CLI + git + jinja on top of SQL is met with disbelief. Users also warn of the potential for mishandling sensitive data, such as social security numbers. However, some users find Looker MCP as a beneficial layer of defense between end-users and unrestricted data queries. The overall sentiment is mixed to negative.
28 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/MinimumVegetable99/5/2025
329

Senior Dev (Fintech) Interview Question - Too hard?

SQL Server
The Reddit community weighed in on a hiring manager's SQL test for senior developers, which no candidates had yet passed. In general, participants critiqued the test for conflating different areas of expertise like data modeling, data quality policy, ETL normalization, and join-performance engineering. The most upvoted comment suggested several improvements to the test, including normalizing phone numbers and emails, reshaping tables for efficient joins, and providing clear business rules for handling duplicate identifiers. Other participants pointed out that the test's complexity and the lack of permissions to index could deter experienced developers. The overall sentiment was negative toward the test and the hiring manager's expectations.
203 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/DryRelationship13309/3/2025
287

Confirm my suspicion about data modeling

Career
The era of data modeling may indeed be fading, driven by a shift towards fast delivery over consistent, trusted data platform design. This shift is largely attributed to business pressures, technology advancements, and a lack of foundational knowledge among data professionals. Some users believe that the decline in data modeling quality is due to a focus on quick wins and an adoption of new architectural frameworks without proper structure. The rapid pace of technology and frequent platform migrations have also contributed to the perceived depreciation of data modeling. However, despite these challenges, many still argue that data modeling remains essential, largely due to the adoption of BI/Data Viz software. The sentiment in the thread is mixed, with some users expressing frustration at the current state of the industry, while others see it as an inevitable evolution of the field.
116 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/crookedstairs9/5/2025
264

[D] An ML engineer's guide to GPU performance

Discussion
A comprehensive guide to GPU performance metrics was shared, receiving positive feedback for its content and usefulness. However, the aesthetics of the site sparked mixed reactions. The design, specifically its typography and color scheme, was considered appealing by some, but others found it difficult to read, causing strain and even migraines. Several users requested a version with readability prioritized over aesthetics. Despite the design concerns, users expressed interest in the guide and praised the effort put into creating it. The designer was also acknowledged for their work. The overall sentiment was constructive criticism with appreciation for the content provided.
22 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/TheStar13599/6/2025
259

How did I go from copy pasting CSVs at 2am to building dashboards execs clap for?

Discussion
The discussion primarily revolves around experiences with imposter syndrome in the data industry. Many professionals relate to the feeling of the ground constantly shifting, and the struggle to adapt to rapid changes. Several contributors noted that confidence grows over time but self-doubt is common, even among seasoned professionals. Strong emphasis is placed on the importance of continuous learning and adapting in a rapidly evolving field. The overall sentiment is positive, with a sense of shared understanding and encouragement to embrace the feeling of discomfort as a sign of growth rather than a mark of inadequacy.
23 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/Fonduemeup9/7/2025
216

After 8 years, I'm thinking of callling it quits

Discussion
The thread discusses the growing disillusionment of a data professional with eight years of experience in the field, citing overwork, poor management, and the increasing reliance on AI as key frustrations. Contributors echo these sentiments, particularly highlighting the lack of mentorship and difficulties navigating constant changes. Some suggest that these challenges aren't unique to data roles but are inherent in corporate work, advising others to find fulfillment outside of work. However, this was countered by a reminder of the significant portion of our lives dedicated to work and the need for it to be inherently worthwhile. Several comments stressed the importance of managing AI as a productivity tool rather than a means to reduce headcount. The overall sentiment is mixed, reflecting a sense of dissatisfaction with the current state of the profession but also offering potential coping strategies and solutions.
67 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/OkOwl67449/7/2025
114

Why Language Models Hallucinate - OpenAi pseudo paper - [D]

Discussion
The conversation revolves around the issue of 'hallucinations' in AI language models, such as OpenAI's GPT-5. Many agree that these hallucinations stem from the pressure on models to provide an answer, even when uncertain, as acknowledging uncertainty can lead to the model being deemed 'lazy'. Some propose introducing an 'I don't know' option to mitigate this problem. However, concerns are raised about the inherent difficulty in determining whether a model knows it's wrong. One user criticizes OpenAI for not effectively addressing these issues, while others argue that these challenges are fundamental to language models. The sentiment is largely critical yet constructive.
48 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Infinite_Explosion9/4/2025
82

[D] How do you read code with Hydra

Discussion
Users discussing Hydra—a popular tool in machine learning projects—expressed mixed sentiments. While recognizing its benefits in making configurations modular and reusable, users found it challenging to read code due to implicit instantiation. A highly upvoted comment suggested minimizing Hydra's use to hyperparameter sweeps and pushing consistent elements into code for easier tracking. Experiment configs were also recommended for reusability. Tools like Hydra-Zen and Pydantic were suggested for more straightforward configurations. Some users voiced concerns over the difficulty of extending code and maintaining sync among team members, advocating for a balance between code and config-driven experimentation. Overall, the thread indicates a need for effective strategies to maximize Hydra's utility.
33 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/prabhuverse9/1/2025
54

Exploring SQL: From SQL*Plus to MySQL

Discussion
The discussion revolves around the transition from SQL*Plus to MySQL. Participants emphasize the importance of not confusing tools with the database brand. They criticize MySQL for having many deviations from the standard SQL language, suggesting that it might not be the best choice for beginners. Participants recommend alternative tools such as DBeaver, HeidiSQL, SQL Developer, and VSCode for a better learning experience. They also propose using livesql.oracle.com for learning specifically Oracle database. Moreover, the use of modern command-line software like SQLcl for Oracle is suggested, and the difference between database software products (DBMS) and command-line clients is clarified. The overall sentiment is mixed.
14 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/wtfstim9/3/2025
38

PopSQL announced it is shutting down. Need an alternative.

Discussion
PopSQL, a popular tool for database collaboration and management, is shutting down, sparking a discussion on alternative solutions. A clear frustration is the perceived over-funding and subsequent failure of such platforms due to expectation vs. reality in terms of growth and return on investment. Some suggest Hex.tech as a viable alternative, while others mention DBeaver Teams, SQLPad, and Metabase, albeit conceding these options may lack PopSQL's user-friendly interface. The discussion underlines the need for an all-inclusive tool that provides security, user management, and fair pricing, with a built-in AI helper seen as a significant plus. The overall sentiment is mixed.
22 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/KBHAL9/7/2025
34

purpose of coalesce

Discussion
COALESCE in SQL is a universal function across dialects that finds the first non-null expression in a list, essentially acting as a specialized CASE statement. It evaluates in order and returns the first non-null value. Care should be taken with data types and hierarchy as COALESCE has rules for data type precedence that can affect output, especially with mixed data types. Also, it does not regard empty values as null, which should be considered to avoid unexpected results. Some users highlighted its strength when used with outer joins and its potential to improve results in well-normalized data scenarios.
20 comments
Share
Save
View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

Get the weekly data subTLDR in your inbox!

We respect your privacy. No spam, ever.