Skip to content

OpenAI Shipped GPT-5.5 Six Weeks After 5.4. The Price Doubled.

DS
LDS Team
Let's Data Science
9 min
GPT-5.5 hits 82.7% on Terminal-Bench 2.0 and nearly doubles Claude Opus 4.7's FrontierMath score. OpenAI also doubled the API price in the process. A new GPT-5.5 Pro tier costs six times more than the standard API.

On Thursday afternoon, OpenAI co-founder Greg Brockman stepped in front of a livestream camera and introduced a model that did not exist six weeks ago.

GPT-5.5, he said, is "a new class of intelligence built specifically for real work and for powering agents." The model was already rolling out in ChatGPT and Codex to Plus, Pro, Business, and Enterprise subscribers. The API would follow. Brockman framed the release as a step "towards more agentic and intuitive computing."

What he did not say on camera, but what the pricing page made obvious: this model costs twice as much as the one it replaced.

The new price list lays out how much this step cost:

TierInput per 1M tokensOutput per 1M tokensContext window
GPT-5.5$5$301M
GPT-5.5 Pro$30$1801M

GPT-5.4, shipped only six weeks earlier, ran at half the standard rates. The Pro tier, aimed at the hardest reasoning and research tasks, costs six times the regular API on a typical long-context workflow, and roughly twelve times what many teams were paying in March.

The Release Pace Is the Story

Model iteration at OpenAI used to be measured in quarters. GPT-5 launched in August 2025. GPT-5.4 shipped in early March 2026. GPT-5.5 arrived on April 23, about six weeks later. Nothing in the company's public roadmap flagged a release this fast.

The compressed cadence maps onto a different clock: Anthropic's. Claude Opus 4.7 landed in mid-April with benchmark results strong enough to dominate developer Twitter for a week. Within ten days, OpenAI had its answer in production.

TechCrunch framed the pace as the company "bringing itself one step closer to an AI super app," pointing to a pattern Brockman and Sam Altman have discussed for months: collapsing ChatGPT, Codex, and the AI browser into a single unified product that can do real enterprise work without a human babysitter.

The release also reopens a question most ML teams had quietly filed away: when does a frontier lab stop releasing incremental improvements and start releasing pricing changes? GPT-5.5 answers both at once.

The Benchmarks Actually Moved

This is the first release in months where the marketing numbers and the independent numbers tell the same story.

On Terminal-Bench 2.0, the benchmark that measures an agent's ability to plan, iterate, and coordinate tools inside a real command-line environment, GPT-5.5 hits 82.7%. Claude Opus 4.7, the previous leader on agentic coding, sits at 69.4%. That is a 13-point gap on a test that, until Thursday morning, Anthropic had effectively owned.

On SWE-Bench Pro, which scores how often a model resolves real GitHub issues end-to-end in a single pass, GPT-5.5 reaches 58.6%. OpenAI's own engineers claim it uses significantly fewer tokens than GPT-5.4 to complete the same work, a detail that matters more than the headline number for anyone running a coding agent at scale.

The math results are the sharper knife. On FrontierMath Tier 4, a benchmark that research mathematicians at Epoch AI built specifically to resist contamination and pattern-matching, GPT-5.5 scores 35.4%. GPT-5.5 Pro, the premium reasoning variant, hits 39.6%. Claude Opus 4.7 on the same tier scores 22.9%. VentureBeat characterized GPT-5.5 as narrowly beating Anthropic's Claude Mythos Preview on agentic coding while opening a wider lead on quantitative reasoning.

BenchmarkGPT-5.5GPT-5.5 ProClaude Opus 4.7What It Tests
Terminal-Bench 2.082.7%69.4%Multi-step command line agents
SWE-Bench Pro58.6%Leads on SWE-Bench VerifiedReal GitHub issue resolution
FrontierMath Tier 435.4%39.6%22.9%Hardest mathematics, contamination-resistant

Anthropic's lineup still leads on several other benchmarks, including SWE-Bench Verified-style refactor tasks. The picture most ML teams will recognize: it is not one winner anymore, it is a routing problem.

What Practitioners Actually Get

The context window is now 1 million tokens, matching what Anthropic and Google have been shipping since last year. Batch and Flex pricing are available at half the standard API rate, which somewhat softens the sticker shock for teams that can tolerate delayed responses.

The model is available to paying ChatGPT and Codex users immediately. API access rolls out in the Responses and Chat Completions endpoints in the coming days.

For anyone running coding agents in production, the arithmetic gets interesting fast. Consider a heavy enterprise workload of 1 billion input tokens and 200 million output tokens per day.

ModelDaily cost at that workload
GPT-5.4 (pre-April 23, at half the new rate)about $5,500
GPT-5.5 (standard)about $11,000
GPT-5.5 Proabout $66,000

Those numbers ignore any efficiency gains from the new model completing tasks in fewer tokens.

OpenAI's implicit argument is that the new model completes more tasks per token, which should pull those effective costs back down. The practitioner's counter-argument: show me the bill at the end of the month, not the benchmark.

How We Got Here in Six Weeks

MARCH 11, 2026
GPT-5.4 ships
OpenAI releases GPT-5.4 with improvements in computer-use agents and long-context reliability.
APRIL 16, 2026
Anthropic ships Claude Opus 4.7
Opus 4.7 posts strong SWE-Bench Verified numbers and becomes the default coding model for many enterprise teams within a week.
EARLY APRIL
GPT-5.5 leaks inside Codex
An unreleased model label appears briefly in the Codex platform. Sam Altman responds to speculation on X with a hint that something is coming.
APRIL 23, 2026
GPT-5.5 releases
OpenAI ships GPT-5.5 and GPT-5.5 Pro with a 1M-token context window, Terminal-Bench 2.0 state-of-the-art, and doubled API pricing.
APRIL 24, 2026
API rollout begins
GPT-5.5 begins rolling out in the Responses and Chat Completions APIs, with Batch and Flex pricing at half the standard rate.

The Counterargument From Pricing

Not every independent reviewer bought the framing.

The-Decoder, one of the first outlets to publish a full analysis of the release, headlined its write-up "OpenAI unveils GPT-5.5, claims a 'new class of intelligence' at double the API price." The point was not that the model was bad. The point was that "a new class of intelligence" is what OpenAI said about GPT-5, and about GPT-5.4, and about GPT-4.5 before that. Every frontier release claims a new class of intelligence. Not every release claims a doubled bill.

There is a practical concern underneath the rhetorical one. SiliconANGLE's analysis noted that many enterprise customers have locked their agent architectures to a specific model tier's cost curve. A team whose unit economics work at GPT-5.4's prices may not work at GPT-5.5's, even if GPT-5.5 resolves more tasks per call. The migration decision is no longer just technical. It is a budget review.

And for open-source-curious teams, the release sharpens the calculus. DeepSeek's V4 preview models, released in the same week, price at a small fraction of GPT-5.5 and still post competitive benchmark numbers on reasoning and coding. Not every workload justifies a frontier API bill. The gap between "best model available" and "best model I can actually afford" is widening, not narrowing.

The Bottom Line

GPT-5.5 is faster, smarter, and almost certainly the best coding agent in production today. It is also twice as expensive as the model it replaced, and the Pro tier costs six times more still.

The release is not a surprise in content. It is a surprise in cadence. OpenAI has now trained the market to expect a new frontier model roughly every six weeks, each priced at a premium to the last. That works as long as the benchmark gains keep pace with the bills. It stops working the moment a team looks at its Anthropic invoice, its DeepSeek invoice, or its self-hosted inference cluster, and does the math on what it is actually paying OpenAI for.

For now, the math still works. The team at Terminal-Bench did not build their benchmark hoping any model would clear 80%, and GPT-5.5 cleared it on the first try. That is the kind of result that moves procurement meetings.

What happens in six more weeks is the question nobody at OpenAI answered on Thursday. If the pattern holds, GPT-5.6 is already in training, and somebody is already drafting a new pricing page. Every frontier lab is now shipping a new model every time the other one does. The customer pays for every round.

Sources

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems
Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

Explore all career paths