← Back to data subTLDR
data subTLDR week 27 year 2025
r/MachineLearningr/dataengineeringr/SQL
Optimizing SQL Queries with 'EXPLAIN ANALYZE', Debating AI's Guessing Game, Navigating SQL Interview Questions, Data Cleaning Woes, and the 2025 Open Source Tech Stack
•Week 27, 2025
Posted in r/SQLbyu/nikkiinit•7/3/2025
888
I wrote one SQL query. It ran for 4 hours. I added a single index. It ran in 0.002 seconds.
PostgreSQL
The discussion highlights the importance of using the 'EXPLAIN ANALYZE' function in SQL to diagnose and optimize slow queries. The consensus suggests that while adding an index might often speed up queries, it is not a guaranteed solution for all cases. Proper understanding of execution plans is crucial, and the problem can sometimes be solved through rewriting poorly written queries or re-architecting the process and logic. Patience is also mentioned, with some participants sharing experiences of letting queries run for hours or even days. Resources for further learning, like 'use-the-index-luke.com', are also recommended. The overall sentiment is instructive and positive.
Posted in r/dataengineeringbyu/victorviro•7/5/2025
517
When data cleaning turns into a full-time chase
Meme
The discussion revolves around the challenges and frustrations of data cleaning, considered by many as a time-consuming but essential task. Respondents widely agree on the importance of clean data for reliable analysis. They emphasize the need for better tools and methodologies to streamline the process. The sentiment is largely negative due to the tedious nature of the task, although there's recognition of its significance in generating accurate results. Despite the grumbles, there's a shared understanding that data cleaning is a fundamental part of the data analysis process that can't be overlooked.
Posted in r/dataengineeringbyu/DataCraftsman•7/4/2025
464
2025 Open Source Tech Stack
Open Source
Several technical professionals engaged in a lively discussion about recommended open-source tools for various use cases. A majority of users, based on upvote count, advocated for FastAPI over Flask due to its better support for async work and built-in validation and documentation features. There were differing opinions on Docker's open-source status, with some suggesting a switch to Podman. However, others clarified that Docker itself is open source, while only some related software like Docker Desktop is proprietary. Some participants criticized the recommendations, questioning the inclusion of certain tools in the machine learning stack and expressing confusion over the map. The general sentiment was mixed, with useful insights interspersed with criticism and suggestions for improvement.
Posted in r/MachineLearningbyu/moji-mf-joji•7/6/2025
201
[D] Remembering Felix Hill and the pressure of doing AI research
Discussion
The AI research community is reflecting on the pressures of academia, amplified by a detailed account of personal experiences in graduate school for Natural Language Processing (NLP). The post sparked a vulnerable conversation, with many users resonating with the struggle and expressing appreciation for the shared story. Meanwhile, some highlighted the potential of AI in mental health, urging that its implementation must be undertaken ethically, prioritizing safety and augmenting, not replacing, human care. The loss of Felix Hill, a figure in the field, was mourned, with users recognizing the importance of addressing mental health issues in academia. Overall, the sentiment was mixed but leaned towards supportive and empathetic.
Posted in r/SQLbyu/Analyst2163•7/2/2025
147
AI is basically guessing, and doesn't really know the answer
Discussion
AI tools, such as Claude, while praised for their eloquence and speed, are seen as fundamentally flawed due to their lack of comprehension and accuracy. They are perceived as relying heavily on sounding correct rather than being correct, which can lead to miscommunication and misinformation. The community suggests that AI should not be blindly trusted, especially with detailed and specific queries, and its responses need verification. Despite this, some users believe there's potential for AI in tasks like structuring long code or comments. However, the AI's tendency to change the code itself is seen as problematic. The sentiment is predominantly critical.
Posted in r/MachineLearningbyu/guohealth•7/3/2025
132
[D] AI/ML interviews being more like SWE interviews
Discussion
AI/ML/DS job interviews are increasingly resembling software engineering (SWE) interviews, focusing more on data structures, algorithms, and coding skills. This reflects a shift in the industry towards seeing AI as an integral part of software, rather than a separate field for creative research. The primary role of AI, ML, and DS professionals is now the execution and integration of learning models into existing systems. However, some roles, particularly in leading industrial research labs, still involve significant research and deep theoretical knowledge. This trend has been met with mixed responses, with some expressing concern over the potential dilution of specialized AI/ML roles.
Posted in r/MachineLearningbyu/Nice-Comfortable-650•7/6/2025
127
[P] We built this project to increase LLM throughput by 3x. Now it has been adopted by IBM in their LLM serving stack!
Project
The LMCache project, an open-source initiative aimed at reducing repetitive computations in LLM inference systems, has been adopted by IBM. It increases throughput in chat applications by threefold, by efficiently offloading and loading large KV cache data to and from DRAM and disk, a significant improvement for multi-round QA settings where context reuse is important but GPU memory is limited. While some users questioned whether the project could affect task performance or accuracy, the consensus is that it primarily saves time without impacting the inference process. Questions on the efficiency of this caching framework compared to others, and the performance penalty of offloading to RAM or disk were also raised.
Posted in r/SQLbyu/Various_Candidate325•7/1/2025
72
Got this SQL interview question and how you'd answer it
Discussion
The Reddit community shared diverse perspectives on the SQL interview question about investigating a 0% conversion rate for a product. The top-rated comment emphasized the importance of not immediately assuming the issue lies in the SQL. Other top comments suggested first verifying whether the 0% rate is accurate based on the available data and understanding the context of 'conversion' in the scenario. A step-by-step, methodical approach was also suggested, including tracing the issue backwards, checking for anomalies in the data, and asking clarifying questions to better understand the issue. The discussion was largely constructive, with users debating whether the question tests SQL logic or communication structure.
Posted in r/SQLbyu/Independent-Sky-8469•7/6/2025
52
In terms of SQL projects
Discussion
The consensus among experienced SQL users is that while the basics may seem simple, real-world application of SQL can be complex and challenging. Learning SQL in depth truly begins when dealing with actual stakeholders, business requirements, and deadlines. Dealing with real-world constraints, integrating with existing data or code, and optimizing and debugging are common in the field. Users also highlight the importance of understanding datasets, as they often contain irregularities. Furthermore, working in large corporations with complex business needs provides a unique challenge that can make SQL seem less simple. There's also a need to interpret stakeholder requests accurately, a skill that requires practice and cannot be acquired through pet projects or online courses alone.
Subscribe to data-subtldr
Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.