Structured Context Engineering Evaluates LLM SQL Performance

Damon McMillan publishes a new paper presenting 9,649 experiments on context engineering for structured data, evaluating 11 models and four formats with SQL schemas from 10 to 10,000 tables. The study finds frontier models (Opus 4.5, GPT-5.2, Gemini 2.5 Pro) outperform leading open-source models, and identifies a 'grep tax' for TOON formats increasing token usage. Results inform file-native agent design.
Scoring Rationale
Comprehensive multi-model, large-scale experiments drive score, but single-source paper limits peer-reviewed credibility and depth on filesystem retrieval could be expanded.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems
