Researchllmbenchmarksgroundinggoogle deepmind

Google DeepMind Releases FACTS Benchmark Measuring Accuracy

|December 12, 2025|By LDS Team

9.0

Relevance Score

Google DeepMind Releases FACTS Benchmark Measuring Accuracy — Photo: i.insider.com · rights & takedowns

This week Google DeepMind introduced the FACTS Benchmark Suite, which measures how reliably AI models produce factually accurate answers across four tasks: internal factoids, web search use, long-document grounding, and image interpretation. The best model, Gemini 3 Pro, scored 69% accuracy, with others substantially lower, highlighting that models are wrong about roughly one-third of answers and raising sectoral risks.

Key Points

1Introduces FACTS benchmark testing models on internal facts, web search, long documents, and images
2Shows limited factual reliability: Gemini 3 Pro hits 69% accuracy, other leading models score much lower
3Warns practitioners: factual errors risk legal, financial, and healthcare harms; require verification workflows

Scoring Rationale

Major official benchmark provides industry-wide factuality measurements, but it's primarily diagnostic rather than delivering immediate technical fixes.

MoreAI Evals news

Sources

Public references used for this report.

1 source

01businessinsider.comGoogle researchers find the best AI model is 69% right

Practice with real Logistics & Shipping data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

High-Value Overnight OrdersEasy

Delivered International ShipmentsMedium

On-Time Delivery Rate by CarrierHard

250 free problems · No credit card

See all Logistics & Shipping problems