← Back to data subTLDR

data subTLDR week 43 year 2025

r/MachineLearningr/dataengineeringr/SQL

Securing Analyst Roles: A Blend of Technical Skill and Real-World Presentation, SQL Interview Strategies: The Battle of Brand vs Category, Navigating Recruiter Queries: The Fine Line Between Confidence and Overselling, OpenDBT: A Fork in the Road for dbt-core's Future, Stepping Into Senior Data Engineering: A Mix of Learning, Licensing, and Laughter

Week 43, 2025
Posted in r/dataengineeringbyu/aleda14510/26/2025
743

Please keep your kids safe this Halloween

Meme
The Reddit community has engaged in a lively discussion around child safety during Halloween, with an undercurrent of technology-related humor. The most supported post humorously points out the high cost of candy bars, seemingly a metaphor for expensive CPU cores. There are also references to 'vendor lock-in' and 'tnsnames.ora', reflecting some discontent towards certain technical aspects. Some comments express concern about what children are fed, with one user jokingly suggesting such practices could be seen as 'child abuse'. Overall, the thread combines light-hearted humor with tech industry insights, reflecting a mixed sentiment.
11 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/gelyinegel10/22/2025
341

dbt-core fork: OpenDBT is here to enable community

Open Source
Concerns about the stagnation of dbt-core and its neglect of community contributions have led to the creation of the OpenDBT fork. This initiative aims to foster community contributions, extend dbt to user-specific needs, and enrich the open-source version. Many in the data community express support and optimism for this effort, while some question the identity of the developers behind the fork and the sustainability of the project. Concerns arise around the fork's reliance on dbt-core, suggesting potential obsolescence if dbt-core changes its license. The initiative welcomes collaboration and contributions from developers and the wider data community.
36 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/Uncle_Snake4310/23/2025
300

Just got hired as a Senior Data Engineer. Never been a Data Engineer

Career
Newly hired Senior Data Engineer receives mixed advice and commentary from Reddit community. Emphasized suggestions include spending the first 90 days learning and solving problems, while others suggest setting a coasting standard. There's a consensus against using the title Professional Engineer without passing the licensing exam, highlighting the specific qualifications required. Some users express challenges faced in their roles, such as dealing with complex data manipulation tasks and high expectations. There's a humorous undertone as users suggest alternative job titles reflecting the diverse challenges in the field. The overall sentiment is mixed, combining congratulations, advice, and a dose of reality.
94 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/MikeDoesEverything10/20/2025
278

[Megathread] AWS is on fire

Discussion
AWS recently experienced outages, causing a significant disruption. The root cause appears to be a faulty DNS entry from DynamoDB API, which led to services being unable to access DynamoDB. This meant users couldn't access services as they couldn't get their resources resolved. Interestingly, systems operating mainly in other regions seemed to be unaffected, even if they were running some operations in the affected us-east-1 region, presumably due to maintaining access to their regional DynamoDB. Reactions ranged from humor to frustration. The issue has since been largely resolved, though some errors may still be occurring.
63 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/Various_Candidate32510/24/2025
246

Finally got an offer for an analyst role

Discussion
The discussion emphasizes the importance of not only possessing strong technical skills in SQL and data analytics, but also being able to perform under pressure, communicate findings effectively to non-technical stakeholders, and demonstrate real-world value. Participants suggest focusing on either the visualization or SQL side of data analysis, depending on individual preferences and career goals. They also stress the value of grounding oneself in data concepts and methodologies, such as Kimball methodology and OLTP versus OLAP. The sentiment is largely positive, with commentators congratulating the original poster and sharing their own experiences and insights.
19 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Adventurous-Cut-707710/22/2025
167

[N] Pondering how many of the papers at AI conferences are just AI generated garbage.

News
Rising concerns are evident regarding the use of AI to generate forged scientific papers, with paper mills in China reportedly producing large volumes. There's a wide belief that this is not just a localized issue, but a global one. Some reviewers at notable AI conferences claim to have come across AI-generated papers, leading to intense debates. However, others maintain that top-tier conferences remain largely unaffected. The issue of AI authorship also raises questions about the validity of a paper, with some arguing that as long as the paper is factually correct, the authorship does not matter. The discourse indicates a mixed sentiment with an urgent call for measures to ensure reproducibility and authenticity.
55 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/No_Marionberry_536610/22/2025
159

[N] Open AI just released Atlas browser. It's just accruing architectural debt

News
The release of Open AI's Atlas browser has stirred a range of opinions. Some suggest the move is primarily to gather training data, while others highlight the lack of incentive for website creators to provide an API for AI agents, as they do not generate ad revenue. Some users expressed skepticism towards the post's promotion of certain companies. It was noted that web scraping for GPT led to many APIs becoming private in 2023, including Twitter and Reddit. The overall sentiment is mixed, with some seeing potential benefits but many voicing concerns and doubts about the feasibility and intention behind the initiative.
87 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/fokass10/22/2025
99

Had a SQL interview today

Discussion
In an SQL interview scenario, the interviewee was asked to find the top two brands from each category sorted by sales. The interviewee's approach involved creating a dense_rank partitioned by category and sorted by sales in descending order. The majority of Reddit comments agreed that this approach was correct but there were varying opinions on whether to partition by brand or category. Some users suggested summing the sales for each brand before creating the ranking, assuming multiple sales entries for each brand. Others indicated that partitioning by both brand and category could result in every brand being ranked first, making the solution less effective. The overall sentiment was mixed, with many users providing alternative solutions and insights.
37 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/Time-Leading233110/22/2025
57

Vague recruiter question - "Do you have excellent SQL skills?"

MySQL
The discussion is centered around the respondent's approach to a recruiter's question about their SQL skills. Most participants believe that the recruiter, likely not technically proficient, was seeking a simple affirmative. Some appreciate the nuanced response, viewing it as a sign of humility and understanding of the breadth of SQL knowledge. Others suggest more direct questioning of the recruiter to clarify their requirements. An undercurrent of advice is to remain honest and not oversell skills, as a good job fit should appreciate a truthful response. The sentiment is predominantly constructive, with understanding of the challenging job market.
50 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/dogecoinishappiness10/22/2025
57

[R] Why do continuous normalising flows produce "half dog-half cat" samples when the data distribution is clearly topologically disconnected?

Research
The discussion revolves around the use of diffeomorphism in continuous normalizing flows (CNFs) when data distributions are topologically disconnected. The consensus is that diffeomorphism is a necessity, not an assumption, due to the need for differentiable and invertible functions in the training process. This leads to the preservation of the topological features of the data. However, this can result in problematic representations when the true data distribution is disconnected, as it might create probability bridges between distinct data clusters. Suggestions to address this include introducing stochasticity, using non-invertible layers, augmenting dimensions, and using mixture components. The sentiment appears to be mixed, acknowledging the diffeomorphic constraint as both a strength and a limitation when data is not smooth.
10 comments
Share
Save
View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

Get the weekly data subTLDR in your inbox!

We respect your privacy. No spam, ever.