← Back to data subTLDR
data subTLDR week 32 year 2025
r/MachineLearningr/dataengineeringr/SQL
AI's Impact on Critical Thinking in Data Analytics Education, Power of Custom SQL Functions, Effective Historical Data Storage Strategies, GPT-5's Role in Data Engineering, and a Comprehensive Free Beginner Data Engineering Course
•Week 32, 2025
Posted in r/MachineLearningbyu/we_are_mammals•8/10/2025
682
[D] Reminder that Bill Gates's prophesy came true
Discussion
The discussion centered around the realization of Bill Gates's prediction, attracting a largely positive response. Many participants acknowledged and admired Gates's foresight, while others were surprised by the accuracy of his prophecy. A few skeptics questioned the context and specifics of his prediction. The majority agreed on the impact Gates's vision has had on modern society, with some expressing concern about potential future prophecies. Despite some dissenting views, the prevailing sentiment was one of respect and appreciation for Gates's prescience.
Posted in r/dataengineeringbyu/eczachly•8/8/2025
499
GPT-5 release makes me believe data engineering is going to be 100% fine
Discussion
The GPT-5 release is seen as a useful tool for those competent in AI, with its fast performance enhancing productivity, particularly for tasks like generating pipeline DAG or updating SQL code. However, users don't believe it will replace data engineers (DEs), but rather expect DEs to become increasingly intertwined with AI. Some skepticism exists about AI's capabilities and the hype surrounding it, with suggestions that the risk to jobs comes more from the AI speculative bubble than the technology itself. There's also a notion that businesses won't trust AIs to build complex, nuanced data pipelines without human oversight.
Posted in r/SQLbyu/tits_mcgee_92•8/5/2025
489
Teaching data analytics has made me realize how much AI is eroding critical thinking skills.
Discussion
In a discussion about the impact of AI on data analytics education, there's a widespread concern that AI's ease of use is eroding critical thinking skills amongst students. Educators note that students often struggle with ambiguous, real-world scenarios due to over-reliance on AI, failing to grasp fundamental SQL operations like 'GROUP BY' or 'ORDER BY'. Participants in the hiring process also share this sentiment, citing difficulties in finding capable candidates despite good incentives. However, the discussion also highlights the value of setting challenging questions and real-world tasks to improve student skills. The overall sentiment is mixed, acknowledging AI's convenience but cautioning against over-dependence.
Posted in r/dataengineeringbyu/joseph_machado•8/5/2025
485
Free Beginner Data Engineering Course, covering SQL, Python, Spark, Data Modeling, dbt, Airflow & Docker
Blog
The free beginner data engineering course has received positive feedback for its comprehensive coverage of key concepts and tools. However, a high upvoted comment suggests a critical look at the industry's trend towards complex tools, advocating for a return to core SQL and Python basics for efficient data processing. There's a sentiment that new terms like medallion architecture are overhyped marketing tactics for traditional data flow concepts. While these modern tools offer benefits like testing, CICD, and UI, they can introduce multiple points of failure if used without proper data architecture consideration. Beginners and veterans alike appreciate the course's contributions to their learning journey.
Posted in r/dataengineeringbyu/victorviro•8/5/2025
425
Keeping the AI party alive
Meme
The general sentiment around AI in the workplace is mixed. Some find humor in the potential risks, while others express frustration, especially around the misconception that AI simplifies all tasks. The complexity and controls needed for handling data with AI are highlighted, indicating that its implementation is not as straightforward as some may believe. There's also a sentiment of skepticism and self-deprecation about the actual contribution of individuals to AI development. A few users show signs of both enthusiasm and frustration, reflecting the duality of AI's potential and the challenges it brings.
Posted in r/MachineLearningbyu/bigbird1996•8/6/2025
157
[D] Is modern academic published zero-sum?
Discussion
The discussion revolves around the perceived zero-sum nature of publishing in prominent academic conferences. Key concerns include the high rejection rates, the perceived stubbornness of reviewers, and the pressure to publish in A* conferences, potentially leading to incomplete work. Many commenters suggested that these issues are magnified in the ML/CS fields due to their unique culture of considering conference papers as highly as journal publications. The solution, as suggested by some, is to shift the emphasis back to journals and transform conferences into platforms for discussion and networking. The overall sentiment is mixed, with some acknowledging the problems, while others defend the system or suggest alternatives.
Posted in r/MachineLearningbyu/HerpisiumThe1st•8/5/2025
143
DeepMind Genie3 architecture speculation
Research
The DeepMind Genie3 architecture, a significant improvement from its predecessor, has sparked considerable speculation among tech enthusiasts. The general consensus accentuates Genie3's impressive emergent capability to maintain consistent and dynamic 3D environments. The system generates worlds frame-by-frame based on the world description and user actions, differing from methods such as NeRFs and Gaussian Splatting that require explicit 3D representation. Some believe Genie3 uses a combination of a high-res initial image model, a latent 3D data structure, and a world model. Others suggest Genie3 might maintain visual quality and temporal consistency using bi-directional transformers in a causal model. The memory requirement for maintaining these worlds, however, remains unclear. Overall, the sentiment is positive, with users expressing awe and curiosity.
Posted in r/MachineLearningbyu/seraschka•8/10/2025
93
[P] From GPT-2 to gpt-oss: Analyzing the Architectural Advances And How They Stack Up Against Qwen3
Project
The discussion around the advancements from GPT-2 to gpt-oss was predominantly positive, with participants expressing appreciation for the post and finding it informative, particularly for keeping up with architectural changes in Language Learning Models (LLMs). A few constructive criticisms were raised about errors in the post, including incorrect images and issues with section numbering. The author responded positively to these remarks, addressing the issues promptly. Overall, the sentiment in the thread was positive and constructive, reflecting a shared interest in the evolution of AI and LLMs.
Posted in r/SQLbyu/Pillstyr•8/7/2025
90
What custom functions have you created in SQL that made your life easier?
Discussion
Several SQL users have shared their experience creating custom functions to improve efficiency. Top contributions include functions for managing dates, calculating distance between latitude/longitude points, and handling data entry errors. Some mentioned the use of techniques like creating Dates and Numbers tables for better readability and performance. There's also a focus on leveraging logic to avoid repetitive tasks. Custom functions to deal with formatting issues, especially in inconsistent data entries, were also highlighted. Many users found these functions to be pivotal in their daily tasks, improving productivity and data management. The overall sentiment is positive, with users appreciating the power and flexibility custom functions bring to SQL.
Posted in r/SQLbyu/Fruitloopes•8/5/2025
59
how do you usually handle storing historical changes in a SQL database without making things a nightmare to query?
MySQL
The majority of professionals suggest the use of separate history tables or slowly changing dimensions (SCD) to store historical changes in a SQL database. SQL Server temporal tables are also recommended as they provide a separate history table and current table with valid from/to timestamps for each record. For point-in-time restoration in case of delta file failure, this method is particularly beneficial. Some also suggest maintaining a change log table for all changes, updates, deletions, or insertions. Before proceeding, it's advised to clearly define the requirements to handle the storage effectively.
Subscribe to data-subtldr
Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.