AI Systems Fail Real Remote Work Tasks

Researchers from Scale AI and the Center for AI Safety published the Remote Labor Index in October, testing top AI systems including OpenAI's ChatGPT, Google's Gemini and Anthropic's Claude on hundreds of real freelance tasks. The team found the best-performing AI autonomously completed only 2.5% of projects, often failing at visual design, long-term memory tasks, and producing technical errors. The results suggest current models can assist but are far from replacing human contractors.
Scoring Rationale
Provides systematic, real-work evaluation with strong methodology; limited by testing snapshot and evolving model improvements.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems


