← Back to data subTLDR

data subTLDR week 31 year 2025

r/MachineLearningr/dataengineeringr/SQL

Combatting Imposter Syndrome in SQL Development, Exploring Version Control Techniques, Hunting for the Perfect SQL IDE, Navigating the Evolving Data Engineering Job Market, Recreating Kafka's Core Logic in Python

Week 31, 2025
Posted in r/dataengineeringbyu/MoRakOnDi7/28/2025
450

Data Engineering Job Market - What the Hell Happened?

Discussion
The data engineering job market has become more competitive due to industry layoffs, and companies are increasingly picky, often having unreasonable expectations. Data engineers are now expected to perform the roles of business analyst, data engineer, and data analyst simultaneously. There is a growing gap between job descriptions and actual work, with transferable skills often being overlooked in favor of exact tech stack matches. The job market is being blamed for expanding responsibilities to cut costs. Despite this, data engineers are also expected to have in-depth knowledge of areas that traditionally fell under DevOps, and manage more stakeholders as business analysts and middle managers are cut.
127 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/Substantial_Fig_78497/29/2025
389

Built Kafka from Scratch in Python (Inspired by the 2011 Paper)

Open Source
The developer's attempt to recreate Kafka's core logic in Python encountered a mixed response. While some appreciated the effort and the educational value of understanding Kafka's workings, others pointed out the absence of critical features like topic partitioning, segmenting, storage, and restrained pulling on the consumer side. They suggested that the project was more akin to a simple observer pattern with a broker and didn't reflect Kafka's unique capabilities. There were suggestions to clean up the repository, further refine the code and even share the project in relevant communities for more feedback. The overall sentiment was constructive criticism, focusing on areas for improvement.
42 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/big_like_a_pickle8/2/2025
274

I used to think data engineering was a small specialty of software engineering. I was very mistaken.

Discussion
The discussion highlighted the common misconception about data engineering being a subset of software engineering. The original poster, a seasoned software engineer, noted the profound differences in the complexity, skill set, and hardware requirements between the two. Many users agreed, emphasizing the unique challenges and complexities of data engineering, including data lineage, feature stores, and the use of advanced tools like Pandas, Polars, and Dask. The sentiment was generally positive and respectful towards data engineers, with many participants expressing newfound admiration for the field. The thread also encouraged software engineers to better understand and appreciate the role of data engineering.
44 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/bill13578/2/2025
154

[R] From Taylor Series to Fourier Synthesis: The Periodic Linear Unit

Research
Shiko Kudo's revelation of the Periodic Linear Unit (PLU) activation function has generated mixed feedback, with many emphasizing the need for a thorough literature review and rigorous testing. The PLU, using cascaded sinusoidal waveforms for approximation, differs from traditional activations using linear components and non-linearities. Critics suggest the PLU introduces complexity and biased activations, going against the Bitter Lesson, and advise testing on varied domains to ground the concept. Some also claim that similar results can be achieved with ReLU, questioning the comparison's fairness. However, there is general acknowledgment of the potential usefulness of PLU, especially in tasks with periodic components. Critics suggest more rigorous and quantitative experiments to strengthen the work.
44 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/Dry-Presentation92957/30/2025
120

I feel like a fraud

MySQL
The thread revolves around a systems developer seeking advice on improving their problem-solving skills, particularly in SQL data migration tasks. The community encourages the individual to rely less on AI tools like ChatGPT, suggesting they instead solve problems independently using resources like documentation and StackOverflow. They also advise not to shy away from asking questions, emphasizing its importance in learning. Some suggest using ChatGPT as a reference only after creating a solution, while others emphasize hands-on methods like typing code instead of copying-pasting. The overall sentiment leans towards viewing problem-solving as a skill that improves with experience and constant learning.
49 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/phatdoof8/1/2025
103

How do you “version control” your sql tables?

Discussion
In a discussion about version control for SQL tables, the consensus suggested that migrations are the common method for handling database versions. By tracking which migrations ran and when, one can control what needs to run or be undone. Tools like Flyway and Liquibase were mentioned, which handle migrations sequentially and track changes across the database respectively. Database projects were also suggested, where every object has a file checked into Git like any other project. However, some users humorously noted the common practice of simply renaming tables as a form of version control. The sentiment was mixed, acknowledging both orderly and chaotic practices.
89 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/LetsTacoooo7/30/2025
85

[R] Deepmind's AlphaEarth Foundations helps map our planet in unprecedented detail

Research
DeepMind's AlphaEarth Foundations has been recognized for its exceptional work in mapping the planet in unprecedented detail. The initiative's usage of machine learning to analyze satellite images has been highly appreciated. The extensive research paper published by the team is expected to be a valuable resource offering insightful tips. The community sentiment strongly supports the project's utility and contribution to the field.
1 comments
Share
Save
View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

Get the weekly data subTLDR in your inbox!

We respect your privacy. No spam, ever.