← Back to data subTLDR
data subTLDR week 25 year 2025
r/MachineLearningr/dataengineeringr/SQL
SQL's Mixed Reception Among Developers, Hard Topics in Data Engineering, and Building an Open-Source Government Data Pipeline: A Tale of Challenges and Triumphs
•Week 25, 2025
Posted in r/dataengineeringbyu/hijkblck93•6/20/2025
519
What are the “hard” topics in data engineering?
Discussion
Data engineering professionals on Reddit highlighted the importance of understanding the inner workings of data structures, storage, and underlying database mechanics, emphasizing the value this offers in debugging and optimizing workloads. Data migration was also noted as a crucial and often challenging aspect of the field, but one that can lead to job opportunities due to its necessity and low appeal. Business knowledge and the ability to deliver real value, rather than just constructing complex data systems, were also deemed essential. Lastly, the ability to understand an existing codebase instead of opting for a rewrite was advised, alongside effectively communicating technical limitations to non-technical stakeholders. The overall sentiment was constructive and solution-oriented.
Posted in r/MachineLearningbyu/Nyaalice•6/22/2025
517
[P] This has been done like a thousand time before, but here I am presenting my very own image denoising model
Project
The discussion revolves around improving an image denoising model, specifically for smooth noise like Gaussian and Poisson. Suggestions include approaching the task similarly to an upsampling task, as it requires understanding features at a deeper level than the local pixel distribution. Real-world testing was also suggested, using high-end digital cameras to create noisy and clean image pairs. Other contributors discussed the use of U-Nets, which can handle different image processing tasks within a single framework. The sentiment is mixed, with many offering advice and sharing their experiences, but also acknowledging the challenges in denoising complex features.
Posted in r/dataengineeringbyu/psgpyc•6/22/2025
358
Interviewer keeps praising me because I wrote tests
Discussion
Writing tests in data engineering is seen as challenging but crucial, as reflected in a post discussing an applicant's positive experience of including basic tests in their task submission. While some users joked about testing in production, the consensus among top comments was that testing is often neglected due to the complexity and time required, especially in maintaining up-to-date frameworks and handling external APIs. However, there was strong support for the importance of not just unit tests, but also integration and stress tests, which better validate a pipeline's functionality. The takeaway was that any testing is better than none, as it demonstrates a proactive approach to tackling inevitable data issues.
Posted in r/MachineLearningbyu/Bright_Aioli_1828•6/22/2025
347
[P] I made a website to visualize machine learning algorithms + derive math from scratch
Project
The community responded positively to a newly created open-source website visualizing machine learning algorithms and deriving mathematics from scratch. The inclusion of code with each visualization was praised, and users compared the site favorably with existing resources like d2l.ai. Users found the visual focus on math exceptionally useful. Some expressed eagerness to contribute to the project and follow its development. There was one inquiry about the reversed chapter arrangement. Overall, the sentiment was overwhelmingly appreciative, with users lauding the work as impressive, cool, and a good idea.
Posted in r/SQLbyu/Fabulous_Bluebird931•6/17/2025
254
Client said search “just stopped working” ... found a SQL query building itself with str_replace
Resolved
The thread reveals a common scenario in software development: rushed solutions leading to potential problems down the line. The original developer's temporary fix for a SQL query issue caused a function failure when a search term included a single quote. The issue was solved by recreating the function using prepared statements and input validation. The incident highlights the risk of accidental SQL injections and the importance of quality code. While many sympathize with the pressures that lead to temporary solutions, they acknowledge the challenges these can create. The overall sentiment is mixed, combining humor with a serious call for better development practices.
Posted in r/dataengineeringbyu/Own-Foot7556•6/22/2025
220
I talked to someone telling Gen AI is going to take up the DE job
Career
The discussion suggests an overall positive sentiment towards the future of data engineering despite the rise of General Artificial Intelligence (GenAI). Many believe that while GenAI will automate some tasks, it will mainly augment the roles of data engineers and data scientists, making them more efficient and enabling them to tackle more complex issues. Moreover, automation is expected to increase the demand for these roles due to the emergence of more software tools and complexity in data operations. Some anticipate that automation will impact data science roles before data engineering, due to tools like AutoML automating significant parts of machine learning workflows. A few participants expressed concerns about businesses prematurely adopting AI at the expense of workforce quality. Overall, the consensus is that understanding the fundamentals of the field is crucial for leveraging AI effectively.
Posted in r/SQLbyu/Spiritgolem_Eco•6/20/2025
185
Is SQL the "Capybara" of programming languages?
Discussion
Programmers have varying opinions about SQL, although there is a significant level of appreciation for its robust design and specific functionality. SQL is seen as indispensable for data analysis, with comparisons made to the role of equations in mathematics. However, some criticize it for poor error handling and difficulty in debugging complex queries. ORM remains popular despite these limitations. Front-end programmers, in particular, express a dislike for SQL, yet acknowledge its irreplaceability. A few users perceive SQL as a necessary evil, citing issues with validation at compile time and clunky schema migrations. Overall, the sentiment is mixed but leans towards the positive.
Posted in r/dataengineeringbyu/sspaeti•6/20/2025
177
The Data Engineering Toolkit
Blog
The Data Engineering Toolkit, a comprehensive open-source resource for data engineers, has been positively received by the community. The toolkit includes over 70 technologies and tools, 10 core knowledge areas, and several programming languages. Users lauded the toolkit for its aggregation of information about data modelling and its usefulness for newcomers to the field. Some users, however, raised concerns about the lack of context for choosing between similar tools. Suggestions were also made for potential additions to the toolkit, indicating a strong interest in its continued development and improvement. Overall, the sentiment towards the toolkit is positive.
Posted in r/MachineLearningbyu/Single-Blackberry885•6/17/2025
174
[D] Burned out mid-PhD: Is it worth pushing through to aim for a Research Scientist role, or should I pivot to industry now?
Discussion
The consensus among experienced individuals suggests that the burnout and dry spells experienced during a PhD are normal, and are often faced by many. While the process of converting a PhD into a Research Scientist role can be challenging and potentially more stressful, the completion of a PhD is viewed as an achievement in itself. It is also suggested that engineering roles may offer more stability and options in the event of layoffs. There is a strong emphasis on self-care and maintaining perspective during these challenging phases. The sentiment is largely positive, emphasizing perseverance and resilience in the face of adversity.
Posted in r/MachineLearningbyu/bawkbawkbot•6/16/2025
144
I'm not obsolete, am I? [P]
Project
The Reddit community appreciates the effectiveness and efficiency of the chicken recognition bot, bawkbawkbot, highlighting that if it works and is cost-effective, it remains the best solution for its purpose. There is no compelling need to implement a more complex, resource-intensive model like multimodal LLMs for such a focused task. However, some users noted the potential benefits of more advanced models, such as better generalization and robustness to unusual data. The sentiment is predominantly positive, with users supporting the continued relevance of CNNs and old-school computer vision approaches, especially in niche applications with limited resources.
Posted in r/MachineLearningbyu/Fantastic-Nerve-4056•6/16/2025
106
ML Research: Industry vs Academia [D]
Discussion
The discussion reveals a consensus on the key differences between machine learning research in academia and industry. The industry, with its ample computational resources, prioritizes an empirical, product-focused approach, often lacking the research vibe found in academia. Academia, despite having fewer resources, emphasizes theoretical work and mathematical proof. Concerns were raised about academic research's integrity, suggesting that some published work fails when tested rigorously. However, it was also noted that the freedom and leadership role offered in academia can be appealing, despite the administrative load. The sentiment was mixed: while some favor the practical, value-driven approach of industry, others prefer the intellectual pursuit in academia.
Posted in r/SQLbyu/gaz2133•6/19/2025
79
Do You use sql for a living?
Discussion
The sentiment among professionals who use SQL indicates a mixed relationship with the tool. Many value its utility for querying databases, while others express a desire to spend more time with SQL. Frustration lies in the time spent on tasks such as creating dashboards or attending meetings. Moreover, the challenge of learning SQL independently has been highlighted. Despite these issues, there is a clear fondness for SQL, with some even expressing a passion for it. The ease of writing SQL and its use in managing business logic within job roles was also emphasized.
Posted in r/SQLbyu/intimate_sniffer69•6/22/2025
62
I have no idea where to go next in my career. I'm clueless
Discussion
The discussion revolves around career progression paths for data professionals. Several commenters suggest moving to management as a possible vertical progression, although it comes with increased responsibility and requires strong people skills. Others advise exploring different roles such as Data Architect, Data Engineer, Analytics Engineer, or Principal IC, based on personal interests and desired challenges. Gaining experience in different businesses or industries is also recommended to broaden one's technical expertise. The sentiment is mixed, with some expressing uncertainty about their next steps and others sharing insights from their own career paths. Overall, the conversation emphasizes the importance of continuous learning and adaptability in the data field.
Subscribe to data-subtldr
Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.