Analysisbenchmarksllmevaluation metrics

Researchers Introduce Humanity's Last Exam Benchmark

theconversation.com

|January 30, 2026

8.3

Relevance Score

Researchers Introduce Humanity's Last Exam Benchmark

A study published this week in Nature introduces Humanity’s Last Exam, a 2,500-question benchmark designed to probe tasks current AI systems cannot solve. The global collaboration of nearly 1,000 experts found leading models scored below 9% initially, highlighting large capability gaps and prompting discussion about benchmarks' limits and the need for task-specific, real-world evaluation metrics.

Researchers Introduce Humanity's Last Exam Benchmark

More AI & Data Science News

WGAW Staff Authorize Strike Over Unfair Practices

Nvidia, Amazon, Microsoft Negotiate OpenAI Investment

Scoring Rationale

Sources

BOK Chief Calls For Pension-Fund FX Hedging

NASA Model Identifies Thousands of Exoplanet Candidates