Mercor Finds AI Agents Fail Consulting Tasks
Mercor published the APEX-Agents benchmark showing leading AI agents completed under 25% of real-world consulting, banking, and legal tasks on the first try and only about 40% after eight attempts; OpenAI's GPT-5.2 initially completed roughly 23% while Anthropic's Opus 4.6 reached nearly 33%. The study found agents perform well at research and single-tool data analysis but fail on long-horizon, multi-step planning and cross-file coordination, and Mercor CEO Brendan Foody says rapid model improvement could displace some consulting roles soon.
Scoring Rationale
Moderate novelty and practical relevance, limited by a single-company benchmark and lack of peer-reviewed validation.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems