Researchers Release Humanity's Last Exam Benchmark

An international consortium released Humanity's Last Exam (HLE) in early 2025, a 2,500-question, expert-vetted benchmark covering math, humanities, and natural sciences to assess large language models. The test contains expert-crafted short-answer and multiple-choice items designed to be non-ambiguous and difficult for models; leading systems initially scored in the single digits, with GPT-5 reaching about 25 percent. HLE aims to track AI expertise, though it measures task performance rather than general intelligence.
Scoring Rationale
High novelty and industry-wide relevance, but limited by curated short-answer focus and potential test overfitting.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems


