AI Approaches Mastery On Humanity's Last Exam

Scale's 'Humanity's Last Exam' (HLE) benchmark, published March 30, 2026, tests 2,500 PhD-level questions across 100+ fields and was designed to be AI-resistant. Models improved from under 3% correct (ChatGPT, 2024) to over 45% recently, and Scale predicts AI could reach near-perfect 'universal expert' performance within a year. The progress pressures evaluators to strengthen assessments and safety measures.
Scoring Rationale
Fresh March 30, 2026 coverage of Scale's HLE shows notable, industry-wide performance gains and high relevance. The score reflects strong scope and relevance, tempered by company-backed sourcing and limited peer-reviewed validation, with actionable implications mostly strategic rather than immediately prescriptive.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.
Sources
- Read OriginalAI dangerously close to solving test that only the brightest minds on Earth could: ‘Human expertise still matters’nypost.com



