Researchllmsql generationopen sourcecontext engineering

Structured Context Engineering Evaluates LLM SQL Performance

|February 10, 2026|By LDS Team

8.1

Relevance Score

Structured Context Engineering Evaluates LLM SQL Performance — Photo: static.simonwillison.net · rights & takedowns

Damon McMillan publishes a new paper presenting 9,649 experiments on context engineering for structured data, evaluating 11 models and four formats with SQL schemas from 10 to 10,000 tables. The study finds frontier models (Opus 4.5, GPT-5.2, Gemini 2.5 Pro) outperform leading open-source models, and identifies a 'grep tax' for TOON formats increasing token usage. Results inform file-native agent design.

Key Points

1Conducts 9,649 experiments across 11 models, 4 formats, and schemas up to 10,000 tables
2Finds frontier models (Opus 4.5, GPT-5.2, Gemini 2.5 Pro) substantially outperform open-source models
3Shows TOON format incurs a 'grep tax' increasing token usage; practitioners must adapt retrieval or formats

Scoring Rationale

Comprehensive multi-model, large-scale experiments drive score, but single-source paper limits peer-reviewed credibility and depth on filesystem retrieval could be expanded.

MoreOpen-Source AI news

Sources

Public references used for this report.

1 source

01simonwillison.netStructured Context Engineering for File-Native Agentic Systems

Practice with real Logistics & Shipping data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

High-Value Overnight OrdersEasy

Delivered International ShipmentsMedium

On-Time Delivery Rate by CarrierHard

250 free problems · No credit card

See all Logistics & Shipping problems