← Back to data subTLDR

data subTLDR week 13 year 2026

r/MachineLearningr/dataengineeringr/SQL

AI-Generated SQL Code Raises Security Concerns, Machine Learning Model Built Solely with SQL Sparks Interest, Challenges of Importing 1TB JSON into SQL Server, High Volume of Data Engineering Job Applicants Misleading, Data Engineering Tycoon Game Hits the Mark

Week 13, 2026
Posted in r/MachineLearningbyu/Fun-Information783/25/2026
268

[D] Is LeCun’s $1B seed round the signal that autoregressive LLMs have actually hit a wall for formal reasoning?

Discussion
The $1B seed round for Yann LeCun's startup Logical Intelligence, which aims to bypass Transformers and generate mathematically verified code using Energy-Based Models (EBMs), has sparked debate on the potential shift away from autoregressive Language Model (LLMs). Many believe the massive funding is due to LeCun's reputation rather than the technology itself. Skepticism is high regarding the practicality of EBMs given their complexity and the computational cost of mapping continuous energy landscapes to discrete outputs. However, there's an overall appreciation for the attempt to innovate technical/theoretical aspects in AI, diversifying from the prevalent LLM trend. Sentiments are mixed, with both excitement and caution.
111 comments
Share
Save
View on Reddit →
Posted in r/MachineLearningbyu/Open_Budget65563/29/2026
258

[P] Built an open source tool to find the location of any street picture

Project
The open-source geolocation tool Netryx Astra V2 has been positively received, with users praising its effectiveness in identifying locations from street pictures, without using LLMs or metadata. Users are encouraged to share their successful searches, though some questioned the tool's potential for misuse. The tool is limited to a 10km radius of New York for its web demo due to GPU costs, but users can index any city with unlimited searches by installing the repo. The creator utilized models from MegaLoc and MASt3R and has credited the referenced papers. Overall, sentiments are predominantly positive.
32 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/ChandanKarn3/26/2026
138

Cursor keeps generating SQL queries like this and it's making me nervous

SQL Server
AI-generated database code poses a significant risk for SQL injections, as indicated by a trend observed in AI tools like Cursor or Claude. Such tools often generate unsecured code that, while functioning correctly during testing, is susceptible to malicious manipulation. The community emphasizes the importance of code reviews and the continued need for human supervision in AI-generated code. Experienced developers voiced concern over the widespread use of AI in code creation, which they see as a potential repetition of past mistakes in database management. The overall sentiment was mixed, with an undertone of skepticism and concern about the blind reliance on AI for tasks requiring a high level of security.
62 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/Secret-Fudge-59323/26/2026
122

Why are Data Engineering job posts getting thousands of applicants?

Career
The high number of applicants for data engineering roles on LinkedIn may be misleading due to automated applications and unqualified entries. Experts estimate around 80% of applicants are either unqualified or using automated tools, leading to inflated numbers. Some employers have reported an influx of applications from overseas candidates who do not meet eligibility requirements, further inflating the figures. Despite the high volume of applicants, finding experienced and qualified data engineers remains a challenge for many employers. The use of keyword scanning software is also discussed, with some companies preferring to manually review applications to avoid missing potential candidates. The sentiment is mixed, as this situation presents both challenges and opportunities in the hiring process.
87 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/luminoumen3/23/2026
119

I built a tycoon game about data engineering and the hardest part was balancing the economics

Personal Project Showcase
The creator of a new browser-based tycoon game about data engineering has received positive feedback for its realistic depiction of the field. Users praised the game's authentic representation of the complications that can arise with automated data collection and infrastructure, with some humorously noting they felt the frustration of paying for automation but still having to execute tasks manually. The challenging game balance, which sees some players go bankrupt immediately while others finish in 15 minutes, was also appreciated for capturing the unpredictability of startup economics. There were also playful suggestions for additional features, such as vendor lock-in and streaming services.
27 comments
Share
Save
View on Reddit →
Posted in r/dataengineeringbyu/Ancient-Proof80133/27/2026
87

Working as a Data Engineer in a Bank

Career
Working as a data engineer in a bank, particularly in Europe, offers a good work-life balance, fewer meetings, and a calm, friendly atmosphere. Longer deadlines allow for quality work, although legacy systems pose challenges. Job stability, deep budgets, and slow iteration cycles are benefits, but they come with potential stagnation and less exciting work. Bureaucracy can slow down processes, while varied tech literacy levels can lead to a range of requests. Experiences differ greatly, with some reporting long hours, tight deadlines, and demanding bosses. Overall, the banking environment is often more relaxed yet can limit exposure to cutting-edge technology.
22 comments
Share
Save
View on Reddit →
Posted in r/SQLbyu/MojanglesReturns3/23/2026
45

Has anyone imported a 1 TB JSON file into SQL Server before? Need advice!

SQL Server
Importing a 1TB JSON file into SQL Server is a complex task that requires a strategic approach. Common advice suggests splitting the file into smaller, manageable parts via scripting language, and creating a deserializer that outputs one entry at a time. Understanding the format of the JSON file is crucial, with an ideal scenario being a large array of identically formatted records. Some users propose using Low-Level Machines (LLMs) as a time-efficient solution, while others stress the need for data quality verification. Concerns arise surrounding structured files, as splitting may not work effectively. The task was generally viewed as challenging but achievable.
73 comments
Share
Save
View on Reddit →

Subscribe to data-subtldr

Get weekly summaries of top content from r/dataengineering, r/MachineLearning and more directly in your inbox.

Get the weekly data subTLDR in your inbox!

We respect your privacy. No spam, ever.