Researchllm judgetable extractionpdf parsingsynthetic benchmark

LLM Judges Improve PDF Table Extraction Evaluation

|March 20, 2026|By LDS Team

8.1

Relevance Score

LLM Judges Improve PDF Table Extraction Evaluation

The paper presents a benchmarking framework using synthetically generated PDFs with precise LaTeX ground truth, sourcing tables from arXiv to ensure realistic complexity. It introduces an LLM-as-a-judge semantic evaluation integrated into a matching pipeline, showing LLM-based scores correlate with human judgments at Pearson r=0.93 versus TEDS r=0.68 and GriTS r=0.70. Evaluating 21 PDF parsers across 100 documents (451 tables) reveals major performance gaps and provides practical parser selection guidance.

Key Points

1Demonstrates LLM-based semantic evaluation achieves Pearson r=0.93 with human judgments
2Highlights that TEDS (r=0.68) and GriTS (r=0.70) poorly capture semantic table similarity
3Provides practitioners a reproducible benchmark and guidance selecting among 21 PDF parsers

Scoring Rationale

Strong methodological contribution and actionable benchmark, supported by human validation, but single-source arXiv preprint limits peer-reviewed confirmation.

Sources

Public references used for this report.

1 source

01arxiv.org[2603.18652] Benchmarking PDF Parsers on Table Extraction with LLM-based Semantic Evaluation

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

LLM Judges Improve PDF Table Extraction Evaluation

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Google Presents SensorFM for Wearable Health Data

GitHub Adds GPT-5.6 Models To Copilot

OpenAI and Google Sell Models to Blacklisted China Groups

Gujarat Bets Rs. 6 Lakh Crore on Data Centres

LLM Judges Improve PDF Table Extraction Evaluation

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Google Presents SensorFM for Wearable Health Data

GitHub Adds GPT-5.6 Models To Copilot

OpenAI and Google Sell Models to Blacklisted China Groups

Gujarat Bets Rs. 6 Lakh Crore on Data Centres