Two months after Claude Opus 4.6, Anthropic has now settled into a two-month upgrade cadence for its most capable generally available model. The pricing did not move. The model identifier in the API is claude-opus-4-7. The benchmarks moved a lot.
This is the story engineers will care about in the morning. The story Anthropic is telling between the lines, about a more powerful model it is still holding back, is the one regulators, enterprise buyers, and Apple are already reading.
The Coding Numbers Are the Reason to Upgrade
On SWE-bench Verified, Opus 4.7 resolved 87.6% of tasks. That is a jump from Opus 4.6's 80.8% and clears Gemini 3.1 Pro's 80.6%. On the harder SWE-bench Pro, the gap widens: Opus 4.7 posts 64.3%, up from 53.4% on Opus 4.6, beating GPT-5.4 at 57.7% and Gemini 3.1 Pro at 54.2%.
The internal numbers Anthropic published are the more interesting ones. On Rakuten's production SWE benchmark, Opus 4.7 resolves three times as many real engineering tasks as Opus 4.6. On a 93-task coding evaluation, the new model shows a 13% improvement over 4.6 and solves 4 tasks that neither Opus 4.6 nor Sonnet 4.6 could complete. On OfficeQA Pro, document reasoning errors fall by 21%.
| Benchmark | Opus 4.7 | Opus 4.6 | GPT-5.4 | Gemini 3.1 Pro |
|---|---|---|---|---|
| SWE-bench Verified | 87.6% | 80.8% | n/a | 80.6% |
| SWE-bench Pro | 64.3% | 53.4% | 57.7% | 54.2% |
| GPQA Diamond | 94.2% | n/a | 94.4% | 94.3% |
| MCP-Atlas (tool use) | 77.3% | n/a | 68.1% | 73.9% |
| MMMLU (multilingual) | 91.5% | n/a | n/a | 92.6% |
| BrowseComp (agentic search) | 79.3% | n/a | 89.3% | n/a |
The takeaways for practitioners are concrete. Opus 4.7 is now the strongest generally available model for agentic coding and scaled tool use. It is roughly tied with competitors on graduate-level reasoning. It gives up ground on multilingual Q&A to Gemini and loses meaningfully on agentic web search to GPT-5.4. For teams running long-horizon coding agents that call out to file systems, databases, and external APIs, the MCP-Atlas number and the SWE-bench Pro jump are the two data points that matter.
Anthropic's framing of the capability gain was unusually direct. The company said users are now handing off "their hardest coding work, the kind that previously needed close supervision" to Opus 4.7 with confidence. A new xhigh effort level gives callers finer control over how much compute the model spends on a single turn. Task budgets, in public beta, let agents cap their own cost before they run away.
Worth noting: Opus 4.7's updated tokenizer means the same prompt generates 1.0x to 1.35x more tokens than on Opus 4.6, according to Anthropic's developer notes. Teams with fixed budgets should recalibrate.
Vision Got a Surprise Upgrade
The quieter technical update is on the vision side. Opus 4.7 accepts images up to 2,576 pixels on the long edge, roughly 3.75 megapixels. That is more than three times the resolution ceiling of Opus 4.6. On Anthropic's internal visual acuity benchmark, the model jumped from 54.5% on Opus 4.6 to 98.5% on Opus 4.7.
For engineers who have been piping screenshots of dashboards, architecture diagrams, chemistry structures, or dense financial documents into Claude and watching it miss small labels, this is a meaningful change. A 3.75 megapixel input is enough to capture a reasonable portion of a business spreadsheet without downsampling, which is exactly where Claude has historically struggled.
Mythos Is the Model in the Room
The line Anthropic chose not to hide is the one the press noticed first. Opus 4.7, Anthropic acknowledged publicly on Thursday, is not the company's most powerful model. It is the most powerful model the company will let most customers run.
The actual frontier model at Anthropic right now is Claude Mythos, revealed last week in the Project Glasswing briefing. Mythos passed Anthropic's own red teaming for autonomously finding and exploiting previously unknown vulnerabilities across every major operating system and browser it was tested on. Anthropic pulled broad distribution and routed Mythos through a controlled access program. Apple is among the handful of "key platform vendors" still permitted to run it.
Opus 4.7 is the commercial counterweight. Anthropic said Thursday that the model ships with "automated safeguards that detect and block requests indicating prohibited or high-risk cybersecurity uses" and a new Cyber Verification Program that grants expanded access to credentialed security professionals. The cyber capabilities were intentionally dialed back relative to Mythos.
The contrast is the business model. A company that has told regulators it can build a model capable of autonomous zero-day discovery is now selling a slightly-less-capable model at the same price point it charged two months ago. The buyers absorb the safety delta. Apple, for reasons Anthropic has not fully explained, does not.
OpenAI Countered Two Days Earlier
Opus 4.7 did not land in a vacuum. On Tuesday, April 14, OpenAI released GPT-5.4-Cyber with an expanded Trusted Access for Cyber program, its own answer to the Mythos problem. OpenAI's bet is that credentialing thousands of security researchers produces a better defensive posture than restricting the model to a handful of named vendors. Anthropic's bet, visible in Thursday's launch, is that it can sell a weaker model to the market and a stronger one to Apple while neither side of that deal blinks.
The commercial context matters. Anthropic's annualized revenue reached roughly 30 billion dollars in April, surpassing OpenAI's 25 billion for the first time, up from 9 billion at the end of 2025. That math only works if Opus 4.7 is good enough to keep enterprises on the Claude platform while Mythos remains off the menu. Thursday's benchmarks are the answer to the question Anthropic's finance team needed answered.
The Two-Week Cyber Race
The Other Side: Where Opus 4.7 Still Loses
The Opus 4.7 numbers are not a clean sweep, and Anthropic did not pretend they were. Three gaps stand out.
On BrowseComp, the Scale AI benchmark for agentic web browsing, GPT-5.4 scored 89.3% against Opus 4.7's 79.3%. If the job is dispatching an agent to crawl the open web, read sites, and chain searches, OpenAI still has the better tool. On MMMLU, Gemini 3.1 Pro's 92.6% edges Opus 4.7's 91.5% for multilingual question answering. On GPQA Diamond, GPT-5.4 Pro's 94.4% is a hair above Opus 4.7's 94.2% and Gemini's 94.3%, a three-way tie within noise that undercuts any claim of reasoning superiority.
And then there is the question that did not exist two months ago. If Opus 4.7 is roughly on par with the OpenAI and Google flagships on paper, and Anthropic has already shown that Mythos is qualitatively more capable, then the model the market can actually buy is not the frontier. The frontier is sitting inside Apple's developer tooling and a small controlled-access group. For security teams, infrastructure buyers, and enterprise ML leads, the ceiling of what you can procure right now is visibly lower than what exists.
Anthropic's line on that question has been consistent: the goal of restricting Mythos is to learn how to deploy "Mythos-class models at scale" safely over time. The counterargument, offered mostly by independent AI policy analysts on Thursday, is simpler. A controlled-access frontier model is a competitive moat dressed as a safety posture. Both things can be true at once.
The Bottom Line
Anthropic shipped a better coding model on Thursday, held the price flat, and put it inside four cloud providers and GitHub Copilot on the same day. For teams that have been running agentic coding pipelines against Opus 4.6, the move to 4.7 is mechanical: change the identifier and recompute the token budget. The SWE-bench Verified jump from 80.8 percent to 87.6 percent is the kind of gain that changes which tasks an engineering team can ship to an agent without human review.
The deeper story is about what Anthropic will not sell you. Opus 4.7 is the commercial ceiling. Mythos is the real ceiling, and it lives behind a vendor approval process. Apple is inside the room. OpenAI chose to meet this by credentialing thousands of security researchers two days earlier. Anthropic chose to meet it by releasing a strong-but-tempered model at a familiar price and betting the market will accept the deal. Thursday's launch is the first data point on whether that bet holds.
As Anthropic put it in its developer notes, users are handing off their hardest coding work to Opus 4.7 with confidence. Whether Mythos ever joins them in production is a different question, and the one the industry will be watching through the summer.
Sources
- Introducing Claude Opus 4.7 (Anthropic, April 16, 2026)
- Claude Opus 4.7 is now available in Amazon Bedrock (AWS, April 16, 2026)
- Claude Opus 4.7 is generally available (GitHub Changelog, April 16, 2026)
- Anthropic reveals new Opus 4.7 model with focus on advanced software engineering (9to5Mac, April 16, 2026)
- Anthropic launches Claude Opus 4.7 with enhanced coding and vision capabilities (Yahoo Finance, April 16, 2026)
- Anthropic Releases Claude Opus 4.7, Beats GPT-5.4, Gemini 3.1 Pro On Most Benchmarks (OfficeChai, April 16, 2026)
- Claude Opus 4.7 leads on SWE-bench and agentic reasoning (The Next Web, April 16, 2026)
- Anthropic Releases AI Model With Weaker Cyber Skills Than Mythos (Bloomberg, April 16, 2026)
- Anthropic rolls out Claude Opus 4.7, an AI model that is less risky than Mythos (CNBC, April 16, 2026)
- Anthropic releases Claude Opus 4.7, concedes it trails unreleased Mythos (Axios, April 16, 2026)
- Exclusive: Anthropic Preps Opus 4.7 Model, AI Design Tool (The Information, April 15, 2026)