Models & Researchbenchmarksanthropicllmssoftware engineering

Anthropic's Mythos Preview tops SWE-bench benchmarks

techmeme.com

|April 7, 2026

7.0

Relevance Score

Anthropic's Mythos Preview tops SWE-bench benchmarks

Anthropic's Mythos Preview achieves 93.9% on SWE-bench Verified, surpassing Opus 4.6's 80.8%. On SWE-bench Pro Mythos scores 77.8% versus Opus's 53.4%. VentureBeat's Michael Nuñez presents these comparative benchmark results, showing a substantial performance gap on SWE-bench evaluations.

Scoring Rationale

Notable and actionable performance differences on a software-engineering benchmark make this relevant to practitioners, but the headline lacks methodological context and fuller evaluation details.

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

Explore all career paths

Sources

Techmeme: Anthropic says Mythos Preview achieves 93.9% on SWE-bench Verified, compared with 80.8% for Opus 4.6, and 77.8% on SWE-bench Pro, versus 53.4% for Opus 4.6 (Michael Nuñez/VentureBeat)
techmeme.com
Read Original