data subTLDR week 23 year 2026

r/MachineLearningr/dataengineeringr/SQL

Upskilling in Downtime: A Beacon of Hope in Professional Stagnation, The SQL Comfort Spectrum: From Months to Years, After SQL: The Debate on What Comes Next, and The 10x Dagster Price Hike: Users' Frustration and Search for Alternatives

June 7, 2026•Week 23, 2026

Posted in r/dataengineeringbyu/sspaeti•6/3/2026

761

101 concepts every data engineer should know (or some of them :)

Blog

The data engineering concept page, recently updated with new terms, previews, and backlinks, has been positively received. Viewers found it a valuable resource, especially for those new to the field, such as apprentices. While some expressed surprise at the breadth of concepts, acknowledging the vast learning scope in data engineering, there was general appreciation for the tool. There was a minor issue regarding a bug in the mobile version, but it was promptly fixed. There was also a query about finding the URL, indicating potential navigation challenges for users.

54 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/CircleRedKey•6/2/2026

267

dagster price increase 10x insane , don't ever use them

Blog

Dagster's recent price surge has led to widespread frustration among users, particularly those using the service for smaller tasks. The 10x price increase has been deemed insane and delusional, with some users stating they were quoted prices almost twice their entire existing bills. However, self-hosting Dagster has been a successful solution for many, suggesting a potential return to self-hosting due to the price hikes. The company's shift towards revenue generation raises questions about the future of its open-source side. Alternatives like Motherduck were mentioned, though it's noted that it serves a different function. Overall, the sentiment is negative due to the abrupt and significant increase in cost.

107 comments

Save

View on Reddit →

Posted in r/dataengineeringbyu/Known-Huckleberry-55•6/1/2026

235

dbt Core v2 is here: still open source, now rebuilt for what's next

Open Source

The release of dbt Core v2 has sparked mixed reactions among the community, with the most upvoted comments expressing disappointment over the absence of column-level lineage in the core version. Some users expressed surprise at the continuation of dbt development following the Fivetran merger, noting the emergence of alternatives like 'dbt-fusion'. Concerns were also raised about the potential for open-source components to be restricted in the future. However, there is also optimism about the commitment to 'open data infrastructure'. Several comments suggest that the strategy behind the development of dbt Core v2 seems to be drawing from the Databricks playbook, balancing open-source elements with proprietary features.

48 comments

Save

View on Reddit →

Posted in r/SQLbyu/Ifuqaround•6/2/2026

143

I no longer feel like there's anything I can offer to my current organization. Anyone feel the same?

Oracle

The majority of Reddit users encouraged a professional who feels stagnant and obsolete in his current role to use the downtime to upskill and provide more benefit for the organization, rather than feeling unproductive. They suggested turning to training resources for Oracle, Microsoft, cloud ETL tools, Python, or AI. Some advised seeking out problems to solve within the organization, while others warned against highlighting a lack of work, as it could potentially lead to layoffs. Overall, the sentiment was largely positive, emphasizing continuous learning and leveraging existing conditions for personal and professional growth.

80 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/Asleep-Requirement13•6/3/2026

105

NeurIPS used uncalibrated AI detector for desk rejections [D]

Discussion

The use of the AI text detector, Pangram, in the desk rejection process of NeurIPS 2026 Position Paper submissions has sparked controversy due to concerns about calibration and false positive rates. Commenters overwhelmingly criticize the method, highlighting the irony of an AI conference falling victim to potential AI inaccuracies. They question the fairness and validity of the process, with some users testing their own papers which resulted in high AI scores, suggesting the system may be flawed. The overall sentiment is negative, with calls for a reliable and equitable AI detector and criticism of NeurIPS organizers for their methodology.

63 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/NielsRogge•6/4/2026

On-policy distillation: one of the hottest terms on PapersWithCode [R]

Research

On-policy distillation (OPD) is gaining traction in AI research, utilized in models like Qwen and DeepSeek. OPD allows models to learn from their errors in real time by inserting hint tokens at the error point and discouraging the recurrence of that mistake. While this creates a more challenging training loop, it potentially saves time spent on hyperparameter tuning. There's also curiosity about the applicability of OPD for continual pre-training due to the potential computational expense. The conversation also noted the merging of PapersWithCode with Hugging Face's Daily Papers, with current publications being sourced from daily submissions and indexed using Github star velocity. Overall, sentiment was positive, with interest in continued exploration and application of OPD and appreciation for the PapersWithCode project.

17 comments

Save

View on Reddit →

Posted in r/MachineLearningbyu/nat-abhishek•6/1/2026

What’s the actual focus in World Models right now? [R]

Research

The World Models field's current focus appears to be split between video generation as a visible frontier and underpinning research questions, such as building representations for physical states and designing update operators for stability over extended periods. The term World Models is often used to refer to both generative models creating coherent 4D worlds and AI models interacting with these worlds. The general direction involves using a fixed encoder as a ground truth for reality, with the model predicting the next encoder output or filling in blanks from masked encoder output. There is significant emphasis on models that can predict, reason, and support decision-making over time, with the aim to learn an internal world model that aids effective agent action. Latent space reconstruction methods are gaining traction, particularly JEPA architectures, which are a step towards world models.

24 comments

Save

View on Reddit →

Posted in r/SQLbyu/Familiar-Meaning-262•6/7/2026

Entry Level Data Analytics

Discussion

The discussion revolves around a recent business administration graduate seeking advice on entering the data analytics field, particularly with limited SQL experience. Contributors with diverse backgrounds agree that SQL skills, while important, can be developed on the job. They stress the value of understanding business context and communicating results effectively. Many suggest hands-on learning through projects addressing real business problems and practicing on platforms like DataCamp, Kaggle, and LeetCode. Networking and sharing work on platforms like GitHub are also recommended. The overall sentiment is positive, encouraging the individual to start applying even before feeling fully prepared.

20 comments

Save

View on Reddit →

Posted in r/SQLbyu/Wise_Safe2681•6/5/2026

How long did it take you to become comfortable writing SQL queries?

SQL Server

Many Reddit users shared their experiences on becoming comfortable with writing SQL queries, with responses indicating a wide range of timeframes. One common sentiment was that while basic SQL commands can be grasped quickly, it takes longer to master more complex queries. Some users reported feeling comfortable within a few months, while others, even with years of experience, admitted to still feeling like beginners due to the depth of the subject. There were also comments highlighting the influence of a conducive working environment on comfort levels. Overall, the sentiment was a mix of encouragement and realism about the learning curve.

63 comments

Save

View on Reddit →

Posted in r/SQLbyu/WhichAd6835•6/1/2026

What I should learn after SQL PL/SQL ??

Discussion

The consensus among Reddit users suggests that mastering a select few tools like SQL, Python, and Snowflake is more advantageous than grasping multiple tools superficially. They advise focusing on improving one's resume and presentation skills, as well as building portfolio projects, as these factors may be hindering interview opportunities. Contributing to open source projects and using online learning guides was also proposed. Some users stressed the importance of gaining industry experience and understanding data modeling concepts. There was a mixed sentiment towards the usefulness of AWS basics, with some users suggesting it's not critical for landing interviews.

23 comments

Save

View on Reddit →

Posted in r/SQLbyu/No_Presentation1421•6/7/2026

Lakebase/Neon experiences from users

PostgreSQL

Users are generally positive about their experiences with Lakebase after its merger with Databricks and Neon. They appreciate the scalability and branching features, citing that these functionalities can significantly reduce development and deployment time. Users also highlighted the cost-effectiveness due to features like auto-scaling and scale to zero. The ability to bypass complex ETL processes due to Lakebase's integration with Databricks and its compatibility with AI apps is also seen as a significant advantage. However, some users expressed a desire for a version of Lakebase decoupled from a workspace. The only potential downside noted is a possible issue with high concurrency.

21 comments

Save

View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

data subTLDR week 23 year 2026

Subscribe to data-subtldr

Get the weekly data subTLDR in your inbox!