Researchagentsconsultingmercorbenchmarks

Mercor Finds AI Agents Fail Consulting Tasks

|February 9, 2026|By LDS Team

6.1

Relevance Score

Mercor Finds AI Agents Fail Consulting Tasks — Photo: i.insider.com · rights & takedowns

Mercor published the APEX-Agents benchmark showing leading AI agents completed under 25% of real-world consulting, banking, and legal tasks on the first try and only about 40% after eight attempts; OpenAI's GPT-5.2 initially completed roughly 23% while Anthropic's Opus 4.6 reached nearly 33%. The study found agents perform well at research and single-tool data analysis but fail on long-horizon, multi-step planning and cross-file coordination, and Mercor CEO Brendan Foody says rapid model improvement could displace some consulting roles soon.

Key Points

1Report shows agents complete under 25% of tasks on first try, 40% after eight attempts
2Models struggle with long-horizon, multi-step planning and with cross-file, multi-tool coordination
3Practitioners should use agents for research and single-tool analysis, retain humans for complex work

Scoring Rationale

Moderate novelty and practical relevance, limited by a single-company benchmark and lack of peer-reviewed validation.

MoreAgentic AI news

Sources

Public references used for this report.

1 source

01businessinsider.comAI agents failed at real-world consulting tasks — but Mercor's CEO says they're still on track to replace consultants

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Researchagentsconsultingmercorbenchmarks

Mercor Finds AI Agents Fail Consulting Tasks

|February 9, 2026|By LDS Team

6.1

Relevance Score

Key Points

1Report shows agents complete under 25% of tasks on first try, 40% after eight attempts
2Models struggle with long-horizon, multi-step planning and with cross-file, multi-tool coordination
3Practitioners should use agents for research and single-tool analysis, retain humans for complex work

Scoring Rationale

Moderate novelty and practical relevance, limited by a single-company benchmark and lack of peer-reviewed validation.

MoreAgentic AI news

Sources

Public references used for this report.

1 source

01businessinsider.comAI agents failed at real-world consulting tasks — but Mercor's CEO says they're still on track to replace consultants

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Mercor Finds AI Agents Fail Consulting Tasks

Key Points

Scoring Rationale

Sources

More AI & Data Science News

GitHub Copilot Automates DNS for GitHub Pages

ClawMagic Offers Desktop AI Agent Automation

Amazon Raises at Least $25 Billion Through Bond Sale

OpenAI Secures US Approval for GPT-5.6 Rollout

Mercor Finds AI Agents Fail Consulting Tasks

Key Points

Scoring Rationale

Sources

More AI & Data Science News

GitHub Copilot Automates DNS for GitHub Pages

ClawMagic Offers Desktop AI Agent Automation

Amazon Raises at Least $25 Billion Through Bond Sale

OpenAI Secures US Approval for GPT-5.6 Rollout