← Back to data subTLDR
data subTLDR week 48 year 2025
r/MachineLearningr/dataengineeringr/SQL
Mastering SQL Challenges: Engaging Disinterested Students, Learning from $2M Mistakes, Navigating Newbie Mishaps, Balancing Python with Other Languages, and Building a Supportive Community Amidst Errors
•Week 48, 2025
Posted in r/MachineLearningbyu/diyer22•11/27/2025
1395
[D] Got burned by an Apple ICLR paper — it was withdrawn after my Public Comment.
Discussion
A researcher shared their experience with a flawed Apple paper that was under review for ICLR 2026. The paper, which contained a critical bug and poor-quality dataset, led the researcher to waste significant time and effort. The paper was withdrawn after the researcher raised issues publicly. The experience highlighted the need for the ML community to remain vigilant and push back against low-quality and irresponsible behavior. Commenters lauded the researcher's efforts while expressing frustration over the commonality of such issues. Some suggested that reviewers should pay more attention to model-assisted dataset construction and the full release of source codes. They also advocated the importance of reproducing baseline results before basing work on someone else's numbers.
Posted in r/SQLbyu/tits_mcgee_92•11/26/2025
252
The most difficult part about teaching students: some of them just don't care about SQL.
Discussion
SQL, despite its perceived complexity and niche appeal, is widely appreciated for its power and utility, particularly in handling data. Many users found it a rewarding challenge to master, and it's still considered a critical skill in today's AI-driven world. However, teaching SQL remains challenging due to varying levels of student interest and the temptation to use AI as a crutch. Expertise in SQL, it's agreed, goes beyond basic understanding to a deep appreciation of set theory and statistics. Despite the difficulties, the joy of seeing students succeed and find careers using SQL remains a driving force for educators.
Posted in r/dataengineeringbyu/Comfortable_Onion318•11/29/2025
198
i messed up :(
Discussion
The majority of commenters expressed empathy for the original poster's mistake, emphasizing that everyone makes errors, especially in high-stress situations. Many shared their own experiences with similar mishaps, fostering a sense of community. There was a strong consensus on the importance of data backups and safeguards against human error, with several users stressing the role of management in implementing such systems. Suggestions included introducing confirmation prompts before major changes and regular audits of jobs and servers. Despite the negative situation, the overall sentiment leaned towards constructive advice and emotional support.
Posted in r/SQLbyu/Bubbly-Group-4497•11/30/2025
195
I don't understand the difference
Discussion
The discussion centers around the difference between two SQL queries and their handling of NULL values, and specific value matches or mismatches. The first query excludes records with a specific value (like customers who bought a laptop), while the second includes records that have any other value (customers who bought anything but a laptop). It's also noted that NULLs, which have no value, are handled differently in SQL and comparisons with null always return false. Users suggested adding a second qualification for NULLs or using functions like COALESCE() or IFNULL(). A general consensus is that the queries return the same result but handle specific cases differently.
Posted in r/MachineLearningbyu/Derpirium•11/28/2025
170
[D] ICLR reviewers being doxed on OpenReview
Discussion
A significant privacy breach has occurred on OpenReview, impacting reviewers for the ICLR conference. Users were doxed (personal information published without consent) after rejecting papers, causing a considerable blow to the integrity of the conference and OpenReview. Despite the offending comments being removed and the burner account deleted, the repercussions are ongoing. Concerns are raised about the permanent presence of this leaked data on the internet and its continued circulation on platforms like social media and GitHub. The situation is described as a toxic combination of OpenReview's security lapse and a record number of submissions, exacerbating existing tensions.
Posted in r/MachineLearningbyu/Dangerous-Hat1402•11/27/2025
140
[D] Openreview All Information Leaks
Discussion
The Openreview platform experienced a bug that revealed the identities of authors, reviewers, and ACs, stirring significant discussion. Commenters validated the leak and expressed concern about conflict of interest and inappropriate reviewing practices, particularly in fields outside of the reviewer's expertise. Some noted potential policy violations if the leaked information was used. This incident also raised questions about the efficacy of double-blind review processes. The bug was quickly fixed, but some users suggested preserving the exposed data for further analysis. Despite some negative reactions, the overall sentiment was mixed, with several users seeing the situation as an opportunity for review system improvement.
Posted in r/dataengineeringbyu/kalluripradeep•11/24/2025
100
The pipeline ran perfectly for 3 weeks. All green checkmarks. But the data was wrong - lessons from a $2M mistake
Discussion
The thread highlights the importance of data quality, standardization, and monitoring in production environments. Participants criticized the practice of not adhering to ISO date standards, which led to confusion and errors. Many emphasized the need for unique tests to avoid issues like double-counting customers and the use of a standardized format for data, especially dates. The necessity of monitoring for silent schema changes and currency confusion was stressed, suggesting methods like tracking distributions over time, proactive schema change monitoring, and comparing against historical baselines. The users also advised validating data before use, even in a staging database for analytics pipelines. The overall sentiment was mixed, with users providing constructive criticism and advising on better practices.
Posted in r/dataengineeringbyu/deputystaggz•11/25/2025
95
Are data engineers being asked to build customer-facing AI “chat with data” features?
Discussion
Many data engineers report being tasked with developing customer-facing AI features, often due to their unique skill sets and the merging boundaries between data engineering and customer-facing roles. However, some consider it a hollow request, as no models can fully explore the vast data an enterprise produces. Despite this, engineers are making progress, with one example involving the creation of a frontend API to expose a BI tool's AI assistant. There's also a trend toward creating secure and flexible data models to ensure data integrity and adaptability. Despite some resistance, these projects are becoming more common as companies don't want to miss out on AI advancements.
Posted in r/SQLbyu/Primary_Sherbert•11/27/2025
64
Newbie - ran stored procedure with a rollback transaction
SQL Server
Two SQL newbies triggered a long-running rollback transaction, initially causing panic and operational disruptions. However, the situation was eventually resolved with a server reset, which proved safe and effective. The incident sparked discussions on responsibility and learning opportunities. Most respondents agreed that such mishaps are common in the field; even experienced professionals sometimes cause database crashes. They emphasized the importance of understanding blocking sessions and how to kill them, owning mistakes, and learning from them. The incident also highlighted the necessity of restricting access for beginners to prevent similar situations. Overall, the sentiment was understanding and supportive, with a focus on continuous learning.
Posted in r/SQLbyu/SightSmash•11/24/2025
44
What programming language should I learn alongside SQL?
Discussion
The consensus among Reddit users is that Python is the optimal programming language to learn alongside SQL due to its versatility in data analysis, data engineering, and admin automation. However, others suggest C# for its smooth interaction within the Microsoft ecosystem. Incorporating stored procedure languages like TSQL, PLPgSQL, PLSQL is also recommended for those wanting to enhance their SQL skills. Some users emphasized tailoring the language choice to specific interests and requirements, mentioning potential choices like PHP, Node.js, Ruby, Rust, Go, Zig, Java, Swift, DAX, and Knime. The sentiment leans towards a tailored approach in language selection.
Subscribe to data-subtldr
Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.