← Back to data subTLDR

data subTLDR week 49 year 2025

r/MachineLearningr/dataengineeringr/SQL

Relearning SQL to Inspire 5,000 Users, Unpacking Null in Queries, Airflow's Overheating Rooms, and Demystifying API Integration

Week 49, 2025
Posted in r/dataengineeringbyu/aleda14512/2/2025
1179

Airflow makes my room warm

Meme
A discussion revolving around the resources consumed by Airflow, a workflow scheduler, reveals frustration among users due to high memory and CPU usage. Users share experiences of their devices running at high capacity even with just a few applications open. Suggestions include the need for a lightweight orchestration tool to counterbalance the resource-intensive nature of Airflow and similar tools, which only schedule jobs. Despite the challenges faced, some users remain hopeful about the potential of container-based development, such as Docker, despite the high resource demands. Overall, the sentiment leans towards dissatisfaction with the current state of resource management.
43 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/0thSpider12/6/2025
989

Wonderful

Resolved
Without the comments, it's not possible to provide a summary. Please include the top comments for a comprehensive summary.
55 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/TurbulentCountry590112/5/2025
572

I started this to relearn SQL. A month later it hit 5,000 users. Thank you.

Discussion
The creator of [sqlcasefiles.com](http://sqlcasefiles.com), an SQL learning website, received positive feedback from users after a month of its launch. The platform, which started as a tool to help the creator relearn SQL, crossed 5,000 users and introduced a new feature called the Case Vault. Users commend the site for its fun learning environment, though some suggest improvements like addressing case sensitivity issues and not clearing the board after each question. Users also encourage the creator to clarify the SQL variant used on the site, as it could confuse beginners. Nonetheless, the overall sentiment remains positive, with some users even considering incorporating the site into their SQL teaching curriculum.
29 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/WhiteBear201812/2/2025
270

[D] Published paper uses hardcoded seed and collapsed model to report fraudulent results

Discussion
The discussion revolves around the fraudulent practices in a published paper, where researchers allegedly used a hardcoded seed and a collapsed model to present misleading results. The main concern is the authors reported results from two different models as if they came from the same model, skewing the accuracy. Upon altering the seed, it was evident that the models often collapsed to single label reporting. The authors' defense was quickly dismissed as an attempt to justify their flawed methodology, leading to the removal of their repository. The thread reflects strong disapproval of such unethical practices and calls for greater transparency in experimental practices.
64 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/Advanced-Average-51412/4/2025
250

Can't you just connect to the API?

Meme
The thread reveals frustration among tech professionals about misconceptions surrounding API integration. Many non-technical parties perceive it as a one-click solution for data transfer, while in reality, it often involves complex problem-solving and management. Several comments also highlight the challenges of flat file ingestion and frequent changes to SFTP folder structures, which can create additional work and lead to access issues. Moreover, some lamented the limitations of APIs that deal with one record at a time, causing difficulties when dealing with large data volumes. The overall sentiment is a mix of humor, exasperation, and resignation towards the complexity and misunderstanding surrounding API connections.
75 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/anikpramanikcse12/6/2025
203

[D] Top ICLR 2026 Papers Found with fake Citations — Even Reviewers Missed Them

News
There's been a significant discovery of fake citations in top-tier ICLR 2026 papers, even overlooked by reviewers. Out of 300 submissions scanned, 50 instances of this misconduct were found. The community is concerned about the integrity of the review process, with calls for stricter scrutiny and possibly penalties for those involved. This incident has sparked a broader conversation about the pressure to publish in academia and the need for more transparent and robust systems to ensure research credibility. The overall sentiment is mixed, with shock and disappointment at the situation, but also optimism for change inspired by the detection of these issues.
30 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/Dense_Car_59112/2/2025
182

Taking 165k Offer Over 175k Offer

Career
The poster chose a $165K job offer over a $175K one for a better work culture and compatibility with colleagues. The community response was largely positive, congratulating the decision and emphasizing the importance of a healthy work environment. The post also shared job hunting tips like aligning technical skills with job requirements, practicing coding interviews, maintaining records of project achievements, and showing genuine interest in the role/company. Commenters appreciated these insights, found them helpful, and wished the poster good luck. One hiring manager stressed the importance of keeping an updated portfolio of projects and their impacts, a sentiment that also resonated with others.
23 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/BetterbeBattery12/2/2025
178

[D] On low quality reviews at ML conferences

Discussion
The machine learning community reflects concern over the dominance of empirical research and the diminishing quality of reviews at ML conferences like NeurIPS and ICLR. There's a common consensus that the imbalance of empirical researchers is skewing the reviewing process, with less appreciation for rigorous scientific work. Many reviewers lack the skills to evaluate theoretical or conceptual work, which may result in superficial judgments. The trend extends to the ACL community and is attributed to an increasing number of submissions and a lack of incentives for high-quality reviews. Some suggest that this is due to the way graduate students are trained, with a focus on empirical results over mathematical rigor. Despite dissatisfaction with the review process, researchers continue to value the acceptance of their papers at these conferences.
54 comments
Share
Save
View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

Get the weekly data subTLDR in your inbox!

We respect your privacy. No spam, ever.