← Back to data subTLDR

data subTLDR week 11 year 2026

r/MachineLearningr/dataengineeringr/SQL

Exploring SQL's Power and Popularity, Dissecting Join Techniques, BigQuery Cost-Reduction Journey, and the Love for Analytics Engineering

Week 11, 2026
Posted in r/MachineLearningbyu/Important-Trash-48683/15/2026
350

[P] I got tired of PyTorch Geometric OOMing my laptop, so I wrote a C++ zero-copy graph engine to bypass RAM entirely.

Project
GraphZero v0.2 is a C++-engineered solution designed to prevent system crashes resulting from memory overloads during Graph Neural Network training on large datasets. By bypassing system RAM entirely, it creates a more efficient system that leverages multi-threading capabilities to optimize performance. Users appreciated the project, noting that it is a step forward from other methods such as np.memmap, which often leads to bottlenecks and implicit RAM copies. Some suggested further improvements, such as using a custom CUDA kernel to increase throughput. The overall sentiment was positive, with users commending the innovative approach and offering constructive feedback for enhancement.
32 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Benlus3/14/2026
333

The arXiv is separating from Cornell University, and is hiring a CEO, who will be paid roughly $300,000/year. "After decades of productive partnership with Cornell University, and with support from the Simons Foundation, arXiv is establishing itself as an independent nonprofit organization"

The community generally supports arXiv's decision to separate from Cornell University and establish itself as an independent nonprofit. Many highlight the platform's pivotal role in democratizing access to scientific publications. Concerns were raised about the new CEO's remuneration, with some arguing that the funds could be better utilized for platform development. However, others countered, emphasizing the importance of attracting top leadership talent. Overall, there's optimism about arXiv's future, but also a call for transparency in its operations and financial management.
69 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/kdfn3/11/2026
281

[D] Can we stop glazing big labs and universities?

Discussion
The research community on Reddit expressed concern about the disproportionate credit given to large labs and universities in machine learning research, highlighting the need to judge research on its own merit. They pointed out that large organizations' significant advertising budgets influence public perception, and this bias also extends to peer review, where studies from well-known institutions often receive less scrutiny. Participants expressed skepticism towards papers from less established sources and remarked on the media hype around research from resource-rich organizations. The role of preprint culture on platforms like arXiv was lauded for democratizing the field. The overall sentiment was critical of the prevailing practices, calling for more nuanced critique and evaluation of research work.
40 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/casualcreak3/13/2026
211

[D] What is even the point of these LLM benchmarking papers?

Discussion
The academic and tech communities expressed mixed sentiment about the relevance of Language Learning Model (LLM) benchmarking papers, given the frequent updates and deprecation of LLMs. Critics argue that these papers, often published in conferences like NeurIPS and ICLR, are more about publication than improvement, and their findings become quickly outdated. Some suggest a return to journals for meaningful results. However, supporters see merit in the datasets these papers produce, which can help catch regressions in real-world applications. They also argue that these papers are important for measuring and understanding the capabilities and risks of LLMs, despite the challenge of keeping pace with constant model updates.
62 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/PR4DE3/10/2026
166

Embarrassing 90% cost reduction fix

Blog
The creator of an uptime monitoring service shared their journey of reducing BigQuery costs by 90% through strategic changes. The main lessons included using BigQuery tools and methods effectively, such as using DATE partitioning and a 90-day partition expiration; moving to Firestore for caching to avoid cache wipes on serverless infrastructure; and utilizing functions and Firestore with BigQuery to perform cost-effective data aggregation for reports and real-time dashboards. The sentiment was positive, with the community appreciating the shared insights and humorously suggesting the author reward themselves for the significant cost savings. There was also interest in the reasons behind choosing Firestore for caching over other methods.
30 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/Tender_Figs3/13/2026
155

I Love Analytics Engineering

Career
A passionate discussion about analytics engineering highlighted the importance of understanding the business that drives the data, rather than focusing solely on infrastructure and data principles. Some participants cautioned that corporate politics often influence business logic, making the business side frustrating at times. Others noted the field's future-proof nature due to the complexity of business issues and the current limitations of AI. They also appreciated the balance of technical and business skills required in the role, which offers a unique niche for graduates/juniors. However, several commentators expressed nostalgia for their analytics days, stressing that team dynamics and company culture greatly influence job satisfaction. The discussion was generally positive.
23 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/querylabio3/14/2026
153

5 BigQuery features almost nobody knows about

Blog
The thread discussed lesser-known features of BigQuery, which can simplify and enhance SQL queries. These include dropping parentheses from current time functions, using 'UNION ALL BY NAME' for matching columns by name rather than position, and chained function calls for easier reading. Users also appreciated the 'ANY_VALUE(x HAVING MAX y)' function and 'WITH expressions,' which name intermediate values within a single expression, reducing the need for sub-expressions or CTEs for a single column. The sentiment was predominantly positive, with users expressing appreciation for these tips, and sharing their own favorite features.
33 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/Needleworkerj93/10/2026
115

I love SQL!

SQL Server
Many commenters resonate with the original poster's enthusiasm for SQL, often sharing their own positive experiences with the language. The majority find SQL intuitive and even fun, with some attributing this to a natural alignment with certain mindsets. A few dive into the technical aspects, explaining that SQL's power lies in its foundation in set theory and its ability to handle large data sets efficiently. Some touch on the quirks of SQL, like the execution order of SELECT statements. Overall, the sentiment leans heavily positive, with a shared appreciation for SQL's capabilities and role in shaping careers.
38 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/ExchangeFar62923/15/2026
74

Its everywhere I look…

SQL Server
Without the specific comments, it's impossible to summarize this Reddit post. Please provide the comments to proceed.
4 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/maglunch3/13/2026
70

Question: What kind of join technique is this?

SQL Server
The Reddit community engaged in a vibrant discussion about a specific join technique in SQL. The join technique, initially thought to be an implicit join, received mostly negative feedback. Many participants advised against it due to its unconventional and complex nature, while some warned of potential professional consequences. However, a few commentators confirmed that this syntax is valid in SQL Server and other modern ANSI databases, despite not being widely used. Some suggested alternatives such as using a subquery or nested joins. The overall sentiment was mixed but leaned towards discouragement of using this specific technique.
83 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/FussyZebra263/15/2026
40

A free SQL practice tool focused on varied repetition

MySQL
The community has expressed mixed feedback about a new free SQL practice tool focusing on varied repetition. While some users find it super helpful and practical for real-world scenarios, others find it confusing and criticize the inefficiency of suggested solutions. Users with higher SQL proficiency pointed out the unnecessary use of subqueries in the proposed solutions, which could lead to inefficient queries. The creator is receptive to feedback and continues to make improvements. Most commonly used SQL skills in daily work, as per one user's input, include JOINs, WHERE filters, GROUP BY, CASE, and fixing broken queries.
14 comments
Share
Save
View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

Get the weekly data subTLDR in your inbox!

We respect your privacy. No spam, ever.