← Back to data subTLDR

data subTLDR week 10 year 2026

r/MachineLearningr/dataengineeringr/SQL

Boost Your SQL Proficiency for Entry-Level Roles, Innovative SQL-on-Canvas Tool Piques Interest, Insights into Live SQL Interviews, Addressing Client's Unrealistic Query Time Expectations, Navigating Layoffs in the Tech Industry

Week 10, 2026
Posted in r/dataengineeringbyu/wtfzambo3/6/2026
292

Client wants <1s query time on OLAP scale. Wat do

Help
The thread discusses an issue of a client expecting less than 1-second query time on a massive dataset using Azure Synapse, with a low budget. The overall sentiment is mixed, with high upvotes on comments suggesting that the client's expectations are unrealistic. Suggestions include investing in a Vertica cluster, which would require a significantly larger budget. Others propose offering an 'options pack' for the client to choose from according to their budget and performance needs. Technical solutions such as using ClickHouse with NVMe and 1TB memory, partitioning and indexing, or using Delta Lake with Databricks, were also mentioned.
391 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Gazeux_ML3/7/2026
267

[P] VeridisQuo - open-source deepfake detector that combines spatial + frequency analysis and shows you where the face was manipulated

Project
The developers of VeridisQuo, an open-source deepfake detection tool, have shared their creation. The tool uniquely combines spatial and frequency analysis to detect deepfakes. It uses an EfficientNet-B4 for spatial/visual analysis and a frequency module for FFT and DCT analysis. The developers highlight the tool's GradCAM integration, which provides a video demonstrating face areas that trigger detection. Training was completed on FaceForensics++, and testing indicated that fusing frequency and spatial analysis is particularly effective at exposing high-quality deepfakes. The developers invite constructive feedback and propose the possibility of cross-dataset evaluation in the future. The sentiment around this tool is generally positive.
21 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Ok-Preparation-30423/5/2026
226

[D] A mathematical proof from an anonymous Korean forum: The essence of Attention is fundamentally a d^2 problem, not n^2. (PDF included)

Discussion
The anonymous Korean forum post discussing a mathematical proof that the essence of Attention is fundamentally a d^2 problem, not n^2, has sparked a lively debate. The proof argues that the field has misunderstood the intrinsic geometry of Attention and proposes a d^2 Pullback Theorem. Some Reddit users believe the concept is sound and call for expert verification, while others question the paper's conclusions, particularly about the polynomial attention proposal. Some also point out practical considerations, arguing that O(nd^3) isn't necessarily better than O(n^2d) due to the large dimensions in modern models. Overall, the sentiment is mixed with a focus on the need for expert review.
84 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/lightyears613/6/2026
224

[R] Low-effort papers

Research
The academic community on Reddit expressed concern about the prevalence of low-effort papers, with a specific reference to those merely training new YOLO versions on public datasets and publishing the results. Most respondents agree that while this may not be academic misconduct, it reflects a systemic issue in research incentives. Criticism is directed towards such practices for not contributing novel insights, methodologies, or data. However, some argue that not all research needs to be groundbreaking and that these papers can provide useful benchmarks. The sentiment is largely negative, with calls for changes in academic incentives and more robust peer review processes.
58 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/briogeosucks3/5/2026
211

I just got laid off

Rant
The sentiment towards a user's account of being laid off is generally supportive and empathetic. The majority opinion suggests not to take layoffs personally, as employees are often seen as assets by companies, with varying value. Some users advised using this time to study and seek better opportunities, reminding that severance is negotiable. The current job market was perceived as challenging by some, while others noted an uptick in recruiter outreach. There were reminders that everyone is replaceable, and the advice to view future employers with the same detachment was suggested. A few users also highlighted the importance of learning from this experience.
99 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/harrytrumanprimate3/4/2026
136

LPT: If you used AI to generate something you share with a coworker, you should proofread it

Rant
There's growing frustration among professionals about the misuse of AI tools to generate content without adequate proofreading. While AI can boost productivity, such tools often produce 'garbage' that isn't validated, shifting the burden onto peers for review. Many feel this misuse creates no net gain in efficiency, and some have begun refusing to engage with AI-generated content unless it's been properly reviewed by the sender. Despite these sentiments, some advocate for clear communication about AI use, such as openly stating when and where AI drafts are being validated. Overall, the sentiment is mixed with strong feelings on both sides.
37 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/No_Imagination48613/3/2026
60

SQL Proficiency for Entry Level Roles

MySQL
For entry-level data analyst and business analyst roles, a moderate level of SQL proficiency is expected. Basic knowledge such as extracting data from tables, understanding primary and foreign key relationships, and using commands like SELECT, WHERE, GROUP BY, and JOINs is essential. Intermediate skills like handling NULL values, writing subqueries, and creating conditional logic with CASE statements are also beneficial. Users recommended practicing on platforms like LeetCode and StrataScratch for realistic business-focused SQL problems. Understanding how to de-duplicate results, use temporary tables, and implement window functions were also mentioned as valuable skills. The sentiment leans towards balancing technical proficiency with the ability to understand and navigate complex datasets.
18 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/aleda1453/5/2026
55

I've built a tool to run SQL on a canvas. In the video I'm exploring which database has the highest average salary from the stack overflow survey

Discussion
The online tool that allows running SQL on a canvas is being received positively, with users likening it to Power BI and expressing interest in its source code. Several users are curious about the tool's operation, asking if the SQL engine runs in the cloud or browser and if it is local first or SAAS model. Responses indicate that the tool operates locally in the browser by default, but can be pushed to remote data sources for live collaboration. Some users are building similar tools, and there's a shared interest in whether this tool will become a commercial product.
19 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/matthewhefferon3/2/2026
28

Has anybody done a live SQL interview?

PostgreSQL
In live SQL interviews, candidates typically share their screen and write queries live, explaining their thought process and reasoning. The ability to talk through solutions and ask insightful questions often outweighs the importance of a perfect output. Experiences vary, with some candidates receiving vague prompts and dealing with laggy databases, while others are given specific tasks like writing a CTE or explaining indexes and execution plans. Success can depend on adaptability and communication skills, as well as technical knowledge. Some interviewees also faced unexpected challenges like unfamiliar SQL versions or formats, highlighting the importance of preparation and practice.
40 comments
Share
Save
View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

Get the weekly data subTLDR in your inbox!

We respect your privacy. No spam, ever.