← Back to data subTLDR
data subTLDR week 23 year 2026
r/MachineLearningr/dataengineeringr/SQL
Upskilling in Downtime: A Beacon of Hope in Professional Stagnation, The SQL Comfort Spectrum: From Months to Years, After SQL: The Debate on What Comes Next, and The 10x Dagster Price Hike: Users' Frustration and Search for Alternatives
•Week 23, 2026
Posted in r/dataengineeringbyu/sspaeti•6/3/2026
761
101 concepts every data engineer should know (or some of them :)
Blog
The data engineering concept page, recently updated with new terms, previews, and backlinks, has been positively received. Viewers found it a valuable resource, especially for those new to the field, such as apprentices. While some expressed surprise at the breadth of concepts, acknowledging the vast learning scope in data engineering, there was general appreciation for the tool. There was a minor issue regarding a bug in the mobile version, but it was promptly fixed. There was also a query about finding the URL, indicating potential navigation challenges for users.
Posted in r/dataengineeringbyu/CircleRedKey•6/2/2026
267
dagster price increase 10x insane , don't ever use them
Blog
Dagster's recent price surge has led to widespread frustration among users, particularly those using the service for smaller tasks. The 10x price increase has been deemed insane and delusional, with some users stating they were quoted prices almost twice their entire existing bills. However, self-hosting Dagster has been a successful solution for many, suggesting a potential return to self-hosting due to the price hikes. The company's shift towards revenue generation raises questions about the future of its open-source side. Alternatives like Motherduck were mentioned, though it's noted that it serves a different function. Overall, the sentiment is negative due to the abrupt and significant increase in cost.
Posted in r/dataengineeringbyu/Known-Huckleberry-55•6/1/2026
235
dbt Core v2 is here: still open source, now rebuilt for what's next
Open Source
The release of dbt Core v2 has sparked mixed reactions among the community, with the most upvoted comments expressing disappointment over the absence of column-level lineage in the core version. Some users expressed surprise at the continuation of dbt development following the Fivetran merger, noting the emergence of alternatives like 'dbt-fusion'. Concerns were also raised about the potential for open-source components to be restricted in the future. However, there is also optimism about the commitment to 'open data infrastructure'. Several comments suggest that the strategy behind the development of dbt Core v2 seems to be drawing from the Databricks playbook, balancing open-source elements with proprietary features.
Posted in r/SQLbyu/Ifuqaround•6/2/2026
143
I no longer feel like there's anything I can offer to my current organization. Anyone feel the same?
Oracle
The majority of Reddit users encouraged a professional who feels stagnant and obsolete in his current role to use the downtime to upskill and provide more benefit for the organization, rather than feeling unproductive. They suggested turning to training resources for Oracle, Microsoft, cloud ETL tools, Python, or AI. Some advised seeking out problems to solve within the organization, while others warned against highlighting a lack of work, as it could potentially lead to layoffs. Overall, the sentiment was largely positive, emphasizing continuous learning and leveraging existing conditions for personal and professional growth.
Posted in r/MachineLearningbyu/Asleep-Requirement13•6/3/2026
105
NeurIPS used uncalibrated AI detector for desk rejections [D]
Discussion
The use of the AI text detector, Pangram, in the desk rejection process of NeurIPS 2026 Position Paper submissions has sparked controversy due to concerns about calibration and false positive rates. Commenters overwhelmingly criticize the method, highlighting the irony of an AI conference falling victim to potential AI inaccuracies. They question the fairness and validity of the process, with some users testing their own papers which resulted in high AI scores, suggesting the system may be flawed. The overall sentiment is negative, with calls for a reliable and equitable AI detector and criticism of NeurIPS organizers for their methodology.
Posted in r/MachineLearningbyu/NielsRogge•6/4/2026
86
On-policy distillation: one of the hottest terms on PapersWithCode [R]
Research
On-policy distillation (OPD) is gaining traction in AI research, utilized in models like Qwen and DeepSeek. OPD allows models to learn from their errors in real time by inserting hint tokens at the error point and discouraging the recurrence of that mistake. While this creates a more challenging training loop, it potentially saves time spent on hyperparameter tuning. There's also curiosity about the applicability of OPD for continual pre-training due to the potential computational expense. The conversation also noted the merging of PapersWithCode with Hugging Face's Daily Papers, with current publications being sourced from daily submissions and indexed using Github star velocity. Overall, sentiment was positive, with interest in continued exploration and application of OPD and appreciation for the PapersWithCode project.
Posted in r/MachineLearningbyu/nat-abhishek•6/1/2026
74
What’s the actual focus in World Models right now? [R]
Research
The World Models field's current focus appears to be split between video generation as a visible frontier and underpinning research questions, such as building representations for physical states and designing update operators for stability over extended periods. The term World Models is often used to refer to both generative models creating coherent 4D worlds and AI models interacting with these worlds. The general direction involves using a fixed encoder as a ground truth for reality, with the model predicting the next encoder output or filling in blanks from masked encoder output. There is significant emphasis on models that can predict, reason, and support decision-making over time, with the aim to learn an internal world model that aids effective agent action. Latent space reconstruction methods are gaining traction, particularly JEPA architectures, which are a step towards world models.
Posted in r/SQLbyu/Wise_Safe2681•6/5/2026
30
How long did it take you to become comfortable writing SQL queries?
SQL Server
Many Reddit users shared their experiences on becoming comfortable with writing SQL queries, with responses indicating a wide range of timeframes. One common sentiment was that while basic SQL commands can be grasped quickly, it takes longer to master more complex queries. Some users reported feeling comfortable within a few months, while others, even with years of experience, admitted to still feeling like beginners due to the depth of the subject. There were also comments highlighting the influence of a conducive working environment on comfort levels. Overall, the sentiment was a mix of encouragement and realism about the learning curve.
Posted in r/SQLbyu/WhichAd6835•6/1/2026
23
What I should learn after SQL PL/SQL ??
Discussion
The consensus among Reddit users suggests that mastering a select few tools like SQL, Python, and Snowflake is more advantageous than grasping multiple tools superficially. They advise focusing on improving one's resume and presentation skills, as well as building portfolio projects, as these factors may be hindering interview opportunities. Contributing to open source projects and using online learning guides was also proposed. Some users stressed the importance of gaining industry experience and understanding data modeling concepts. There was a mixed sentiment towards the usefulness of AWS basics, with some users suggesting it's not critical for landing interviews.
Subscribe to data-subtldr
Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.