Researchbenchmarksllmmultimodalai safety

Researchers Introduce Humanity's Last Exam Benchmark

e.vnexpress.net

|February 3, 2026

10.0

Relevance Score

Researchers Introduce Humanity's Last Exam Benchmark

A Nature study published Jan. 28 co-led by Phan Nguyen Hoang Long introduces Humanity’s Last Exam (HLE), a 2,500-question multimodal benchmark assessing expert-level reasoning of LLMs like Gemini, GPT-5.2, and Grok. Developed with contributions from more than 1,000 professors across 500+ institutions, HLE already informs model leaderboards and industry evaluations, highlighting current AI scores well below top-tier human experts (~90%).

Researchers Introduce Humanity's Last Exam Benchmark

More AI & Data Science News

Andrej Karpathy Urges Return To RSS Feeds

OpenClaw Sparks Rapid Adoption And Security Concerns

Scoring Rationale

Sources

Moltbook Suffers Rapid Data Breach Exposing Tokens

Card Firms Target Foreign Customers For Growth