Skip to content

Anthropic's Sonnet 5 Nearly Caught Its Flagship. The Real Story Is the Price.

DS
LDS Team
Let's Data Science
8 min
Claude Sonnet 5 scored 63.2% on agentic coding against Opus 4.8's 69.2%, and edged past Opus on knowledge work. It launched at $2 per million input tokens, undercutting Opus, GPT-5.5, and Gemini 3.1 Pro. On July 1 it became the default model for every free Claude user.

Neel Chotai handed Claude Sonnet 5 a bug and walked away. He did not tell it how to fix the problem. He did not ask it to prove the fix worked. He just described what was broken.

The model wrote a test that reproduced the failure, implemented a fix, then stashed its own change to confirm the bug came back without it. One pass. Chotai, a Rust engineer who tested the model before release, said the model did all of that without being asked to.

That behavior, finishing a multi-step job without a human walking it through each step, is exactly what Anthropic built Claude Sonnet 5 to do. It is also why this release matters more than a routine version bump.

Anthropic shipped Sonnet 5 on June 30, and it is the company's mid-tier model, the one that sits below the flagship Opus line and above the small, fast Haiku models. For most of the past year, the biggest jumps in agentic skill, the ability to plan, use tools, and run on their own, came from the expensive Opus models. Sonnet 5 closes most of that gap. On an agentic coding benchmark it scored 63.2%, against Opus 4.8's 69.2% and the previous Sonnet's 58.1%. On a knowledge-work benchmark, it slightly beat Opus 4.8 outright.

Then comes the part that changes how practitioners will actually build with it. Sonnet 5 costs far less than the model it nearly matches.

The Gap Between Sonnet and Opus Just Narrowed

For many developers, the agentic era began with Sonnet-class models. Claude Sonnet 3.5, 3.6, and 3.7 were the first models that showed real skill at coding and tool use, the first that could be trusted to take actions rather than just answer questions. More recently, the clearest gains moved up the price ladder to Opus.

Sonnet 5 pulls that capability back down. Anthropic describes it as its most agentic Sonnet yet, able to make plans, drive browsers and terminals, and run autonomously at a level that "just a few months ago, required larger and more expensive models." The testers Anthropic cited kept describing the same thing: a model that finishes.

One engineer at Zapier gave it a two-part task that spanned two different systems and watched it complete the whole thing. "That used to stall halfway," said Daniel Shepard, a senior engineer at the company. "For day-to-day automation, it's a no-brainer."

The point is not that Sonnet 5 beats Opus. It does not, on the hardest problems. Anthropic is direct about this: Opus 4.8 remains the model of choice for the toughest judgment calls and deep research. What changed is that the floor moved up. Work that used to require the flagship now runs on the mid-tier model, and the mid-tier model checks its own output along the way.

Price Is the Battlefield Now

Sonnet 5 launched at an introductory price of $2 per million input tokens, in effect through August 31, before it rises to standard rates. Opus 4.8, the model it chases, costs more than twice as much per input token and far more on output. The full comparison sits below.

ModelAgentic coding scoreInput / 1M tokensOutput / 1M tokens
Claude Sonnet 5 (launch price, through Aug 31)63.2%$2$10
Claude Sonnet 5 (standard, after Aug 31)63.2%$3$15
Claude Opus 4.869.2%$5$25

That pricing puts Sonnet 5 below not just Opus but also OpenAI's GPT-5.5 and Google's Gemini 3.1 Pro. It is still more expensive than Google's cheaper Gemini 3.5 Flash, which sets the true floor.

The timing is not an accident. Anthropic's release lands in the same two-week window as OpenAI's GPT-5.6 Sol and follows Google's push to reframe Gemini as an agentic tool rather than a chatbot. Every major lab is now saying the same thing about its newest model: it is the most agentic one yet. When three companies all claim the same capability, that capability stops being the selling point. The question becomes who can deliver it cheaply and reliably enough to run without a human watching. Sonnet 5 is Anthropic's answer, and the answer is priced to win the agents that run all day.

For practitioners, the calculus is familiar from the shift to cheaper open-weight models like GLM-5.2, which beat GPT-5.5 at coding for a sixth of the price. The frontier keeps getting cheaper to reach, and the labs know it.

The Safety Ledger Is Mixed

Anthropic's pre-deployment testing found Sonnet 5 safer overall than its predecessor. It refuses malicious requests more consistently, resists hijack attempts in prompt-injection attacks better, and shows lower rates of hallucination and sycophancy than Sonnet 4.6. On Anthropic's automated audit of misaligned behaviors, it scored better than the model it replaces.

It is not, however, as well-behaved as the flagship. On that same audit, Sonnet 5 showed somewhat higher rates of misaligned behavior than both Opus 4.8 and the Mythos Preview model. Capability and alignment did not move in lockstep.

On cybersecurity, the story is more reassuring. Anthropic says it did not deliberately train Sonnet 5 on cyber tasks, and on evaluations testing dangerous skills like writing software exploits, the model performs far worse than Opus. In one test built with Mozilla, which measured whether models could develop a working exploit for a since-patched Firefox vulnerability, Sonnet 5 never produced a full working exploit, scoring 0.0%. Anthropic still shipped it with real-time cyber safeguards enabled by default, the same ones running on Opus 4.7 and 4.8. This is the same safety-first posture the company took when it shipped Opus 4.8, the model that catches its own bugs.

The Other Side: Cheaper Tokens, but More of Them

A lower per-token price does not automatically mean a lower bill. Sonnet 5 uses an updated tokenizer, and Anthropic acknowledges that the same input can now map to roughly 1.0 to 1.35 times as many tokens as before, depending on the content. The company says it set the introductory price so the switch is "roughly cost-neutral" against the old Sonnet. In other words, part of the headline discount is offset by the model counting more tokens for the same text, and higher-effort agentic runs burn tokens fast.

There is a broader skeptic's case, too. Enterprises spent the past year learning that agentic AI bills scale in ways that spreadsheets did not predict, and some are now pushing back. The CEO of one AI startup, Lindy, moved all of his company's traffic off Claude to the cheaper, open-weight DeepSeek. That instinct, reaching for the cheapest model that clears the bar rather than the most capable one, is precisely the pressure Sonnet 5 is built to answer. Whether "near-Opus quality at a lower price" is enough to hold customers who can get "good-enough quality for far less" from open weights is the real contest, and it is not settled by a benchmark table. The same cost logic drove the rise of DeepSeek V4, which matched frontier models at a fraction of the cost.

The Bottom Line

Claude Sonnet 5 is a strong model at an aggressive price, and that combination is the actual news. The benchmark gap to Opus is real but narrow. The price gap is wide and deliberate. Anthropic has decided that in a market where every lab can now claim agentic capability, the way to win is to make competent, self-checking agents cheap enough to leave running.

The catch is that "cheap" is doing quiet work. A new tokenizer, higher token counts on hard tasks, and a floor set even lower by open-weight rivals all complicate the simple story of a discount. For a data scientist choosing a model this week, the honest read is that Sonnet 5 lowers the cost of good agentic work without ending the argument over whether good is worth paying for at all.

Anthropic put it plainly in its own framing: Opus is still the model for the hardest problems, and Sonnet 5 is the one that gives everyone else a better deal. The labs have stopped competing on whether their models can act. They are competing on what it costs to let them.

Sources

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems
Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

Explore all career paths