FACTS Benchmark Suite Establishes Factuality Standard

The FACTS Benchmark Suite, developed by the FACTS team with Kaggle, has been released to systematically evaluate LLM factual accuracy across four dimensions. The suite—comprising 3,513 curated examples across public and private splits and managed leaderboards—adds Parametric, Search, and Multimodal benchmarks alongside Grounding v2, reporting a FACTS Score; Gemini 3 Pro leads at 68.8% while no model exceeds 70% overall. The project aims to support ongoing research.
Key Points
- 1Introduces four-dimensional benchmark covering parametric, search, grounding v2, and multimodal factuality
- 2Shows performance gaps: top model Gemini 3 Pro scores 68.8% and no model exceeds 70% overall
- 3Enables standardized evaluation with Kaggle-held private sets, public leaderboard, and reusable 3,513-example datasets
Scoring Rationale
Authoritative release and broad applicability raise the score; limited novelty and sub‑70% model accuracy curb transformational impact.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems