data subTLDR week 39 year 2025

r/MachineLearningr/dataengineeringr/SQL

Exploring Fun in Tech Jokes, Effective Ways to Master SQL, Embracing a 'Generalist' Role, Insights on Data Engineering, Challenges in AI Roles

September 28, 2025•Week 39, 2025

Posted in r/dataengineeringbyu/growth_man•9/23/2025

1805

It's All About Data...

Meme

There's a clear consensus on the importance of data engineering, with many emphasizing the role's critical functions beyond just setting up pipelines. A prevalent theme was the necessary distinction between data creation and data engineering, and the accountability for data quality. Many agreed that data engineers shouldn't be held responsible for fixing poor data and that the onus for repair lies with data owners. There's also a recognized lack of standardization in data job titles that can cause confusion. The sentiment, although mixed, leans towards a call for better recognition and understanding of data engineers' roles.

41 comments

Save

View on Reddit →

Posted in r/SQLbyu/andrewlik•9/24/2025

708

A joke from my uni's lecture slides

Discussion

The discussion revolved around sharing and appreciating dad-jokes, particularly those with a tech or mathematical twist. Highly upvoted comments included puns about databases, software testing, and SQL queries. Some participants also made humorous critiques about the joke structure and syntax. Overall, the sentiment was positive and lighthearted, with participants enjoying the humor in these niche jokes. Despite some comments critiquing the joke's delivery, this did not detract from the overall enjoyment of the shared humor.

19 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/Background_Artist801•9/26/2025

704

Reality Nowadays…

Meme

The discussion centers around the challenges of working with unclean, unstructured data in artificial intelligence (AI) roles, particularly in the context of businesses that have not managed their data well. Many participants highlight the struggle of integrating and cleaning long-neglected data sources. A common sentiment is that despite the difficulty, there's a degree of job security in being the only one who understands the tangled data mess. The overall tone is one of commiseration and humor at the shared challenge, with a persistent motif comparing the situation to Sisyphus' eternal struggle.

16 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/Only_Emergencies•9/24/2025

329

[D] Is senior ML engineering just API calls now?

Discussion

Many senior Machine Learning engineers are expressing concerns over their roles becoming more about integrating existing models through APIs and prompt engineering, rather than building and fine-tuning models from scratch. Many of them have noticed this shift in the industry since the boom of generative AI. While some find it frustrating, others point out that using APIs can often fulfill business needs more efficiently. Still, there is a sense of nostalgia for the more hands-on, experimental aspects of ML roles. Some individuals have transitioned to fields like econometrics for more challenging problems, while others suggest seeking roles in companies that require more than API calls. Overall sentiment is mixed.

125 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/Potential_Loss6978•9/24/2025

304

How do I go from a code junkie to answering questions like these as a junior?

Discussion

Many junior data engineers feel overwhelmed by the complex scenarios discussed in their field, but experienced practitioners reassure them that these are not typical. Most data engineering work doesn't involve handling petabytes of data or real-time processing. The demand for real-time data often softens when the meaning of 'real-time' is clarified, revealing a more flexible delivery time. Designing high-scale, real-time systems is a niche skill, often unnecessary for mid-sized companies transitioning from Excel to more robust reporting solutions. Therefore, juniors should not be overly anxious about their ability to handle such scenarios.

103 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/Jake-Lokely•9/28/2025

252

Week 1 of learning pyspark.

Help

The user shared their week 1 experience of learning PySpark and sought feedback about their learning progress and future plans. The community responded positively, recommending practical experience over theoretical learning. They encouraged the user to work on a real project end-to-end, facing and overcoming challenges naturally. They also suggested understanding common use cases, data manipulations, and join strategies before diving deep into Spark's internals. Several users recommended a PySpark YouTube playlist as a valuable learning resource. Feedback on the user's plan for week 2 advised focusing on Spark optimization through hands-on practice and exploring other Data Engineering topics.

33 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/ParticularWork8424•9/23/2025

172

[D]: How do you actually land a research scientist intern role at a top lab/company?!

Discussion

Landing a research scientist intern role at top tech companies is challenging due to the high competition. The successful candidates often focus on quality research that has real-world applicability, rather than just aiming for publication in top-tier venues. They also bring deep domain expertise and understand why their models work. Networking and referrals play a significant role, with many successful candidates meeting with potential employers at conferences. While having a solid research profile is important, recruiters also value software skills and creativity. They typically check GitHub profiles and use interview time to assess applied math skills and conceptual thinking. Overall, the sentiment is positive but realistic about the intense competition for these roles.

51 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/simple-Flat0263•9/24/2025

[D] NeurIPS should start a journal track.

Discussion

The discussion on starting a journal track for NeurIPS reveals mixed views. Some suggest refocusing on existing journals like TMLR and JMLR and raising their credit. Others propose a NeurIPS Findings label to streamline the acceptance process for papers. A shared concern is the quality of conference reviews, with some describing them as a joke, and arguing that journals offer a more reasonable choice. A few propose transforming NeurIPS into a separate conference or splitting it into smaller, focused conferences, primarily due to dissatisfaction with extensive travel and broad scope. The sentiment is largely critical, with calls for greater recognition of existing journals and improvement in review processes.

57 comments

Save

View on Reddit →

Posted in r/SQLbyu/1xEdmurtrichyx1•9/25/2025

I know SQL basics — what projects can I build to practice and get better?

Discussion

There's a positive sentiment towards using real-world datasets and personal interests to develop SQL skills. A popular idea is exploring StackOverflow's anonymized database or Kaggle's SQLite datasets for project inspiration. Others suggest solving real-world use case issues by designing a product or interface. Personal projects like expense trackers or book highlight analysis are also recommended. Similarly, working with messy data, setting up a mini data warehouse, and creating an analytics dashboard are seen as beneficial. Some users emphasized the importance of passion for a topic to maintain interest and drive learning over time.

25 comments

Save

View on Reddit →

Posted in r/SQLbyu/Creative_Oven3206•9/23/2025

Is being a SQL 'generalist' good enough in this US market? Layoff question!

Discussion

The commenter's extensive experience with SQL, data analysis, and software development is viewed as a strength, not a drawback, by the community. Many believe that being a generalist provides more career options, as it demonstrates adaptability and a broad skill set. However, success in finding a new role is often contingent on effectively communicating the value and impact of these skills. Networking and personal referrals are favored over traditional job platforms like LinkedIn for job searching, though experiences vary. Despite concerns about potential layoffs, the overall sentiment is optimistic, with many confident that the commenter's skills are desirable in the current market.

37 comments

Save

View on Reddit →

Posted in r/SQLbyu/No_Lobster_4219•9/28/2025

What is a CROSS APPLY ?

SQL Server

CROSS APPLY is an operation distinct from CROSS JOIN, used in SQL queries. It's akin to a join that invokes a table-valued function or subquery for each row on the left, effectively working as a for-each loop in SQL. It is particularly efficient for joining to another table when only the most recent entry from that table is required. However, some developers caution against its use due to difficulties in debugging and reviewing. Yet others argue that CROSS APPLY can help clean up code where common table expressions (CTEs) might become bloated. The operation is also known as a LATERAL JOIN in other databases.

42 comments

Save

View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

data subTLDR week 39 year 2025

Subscribe to data-subtldr

Get the weekly data subTLDR in your inbox!