Researchers Introduce Humanity's Last Exam Benchmark

A Nature study published Jan. 28 co-led by Phan Nguyen Hoang Long introduces Humanity’s Last Exam (HLE), a 2,500-question multimodal benchmark assessing expert-level reasoning of LLMs like Gemini, GPT-5.2, and Grok. Developed with contributions from more than 1,000 professors across 500+ institutions, HLE already informs model leaderboards and industry evaluations, highlighting current AI scores well below top-tier human experts (~90%).
Scoring Rationale
Peer-reviewed Nature benchmark with extensive expert and industry adoption, meriting highest impact despite potential ongoing calibration and model improvements.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.
Sources
- Read OriginalVietnamese engineer co-leads Nature paper introducing Humanity's Last Exam for AI, project advised by Alexandr Wang with Elon Musk's ideae.vnexpress.net



