Researchbenchmarksllmscaleexpert evaluation

AI Approaches Mastery On Humanity's Last Exam

|March 30, 2026|By LDS Team

7.6

Relevance Score

AI Approaches Mastery On Humanity's Last Exam — Photo: nypost.com · rights & takedowns

Scale's 'Humanity's Last Exam' (HLE) benchmark, published March 30, 2026, tests 2,500 PhD-level questions across 100+ fields and was designed to be AI-resistant. Models improved from under 3% correct (ChatGPT, 2024) to over 45% recently, and Scale predicts AI could reach near-perfect 'universal expert' performance within a year. The progress pressures evaluators to strengthen assessments and safety measures.

Key Points

1Documents rapid AI improvement: ChatGPT <3% in 2024 to over 45% recently on the HLE.
2Establishes HLE as 2,500 PhD‑level, multi‑disciplinary benchmark designed to be AI‑resistant and hidden.
3Signals urgent need for stronger evaluations and safety measures as models near 'universal expert' competence.

Scoring Rationale

Fresh March 30, 2026 coverage of Scale's HLE shows notable, industry-wide performance gains and high relevance. The score reflects strong scope and relevance, tempered by company-backed sourcing and limited peer-reviewed validation, with actionable implications mostly strategic rather than immediately prescriptive.

MoreAI Evals news

Sources

Public references used for this report.

1 source

01nypost.comAI dangerously close to solving test that only the brightest minds on Earth could: ‘Human expertise still matters’

Practice with real Logistics & Shipping data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

High-Value Overnight OrdersEasy

Delivered International ShipmentsMedium

On-Time Delivery Rate by CarrierHard

250 free problems · No credit card

See all Logistics & Shipping problems

Researchbenchmarksllmscaleexpert evaluation

AI Approaches Mastery On Humanity's Last Exam

|March 30, 2026|By LDS Team

7.6

Relevance Score

Key Points

1Documents rapid AI improvement: ChatGPT <3% in 2024 to over 45% recently on the HLE.
2Establishes HLE as 2,500 PhD‑level, multi‑disciplinary benchmark designed to be AI‑resistant and hidden.
3Signals urgent need for stronger evaluations and safety measures as models near 'universal expert' competence.

Scoring Rationale

MoreAI Evals news

Sources

Public references used for this report.

1 source

01nypost.comAI dangerously close to solving test that only the brightest minds on Earth could: ‘Human expertise still matters’

Practice with real Logistics & Shipping data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

High-Value Overnight OrdersEasy

Delivered International ShipmentsMedium

On-Time Delivery Rate by CarrierHard

250 free problems · No credit card

See all Logistics & Shipping problems

AI Approaches Mastery On Humanity's Last Exam

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Fed Links AI Infrastructure Demand to Rising US Inflation

Marketers Hesitate Adopting AI for Influencer and CTV

OpenAI Expands Advertising Pilot to France, Germany, Ireland

Cloudflare Tightens Defaults for Mixed-Use AI Crawlers

AI Approaches Mastery On Humanity's Last Exam

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Fed Links AI Infrastructure Demand to Rising US Inflation

Marketers Hesitate Adopting AI for Influencer and CTV

OpenAI Expands Advertising Pilot to France, Germany, Ireland

Cloudflare Tightens Defaults for Mixed-Use AI Crawlers