Two weeks after GPT-5.4 set a new bar for computer use, OpenAI returned to the release schedule with something quieter but, for most developers, more immediately relevant. On Monday, the company shipped two compact models — GPT-5.4 mini and GPT-5.4 nano — optimized for coding assistants, autonomous subagents, and computer control tasks.
The pitch is familiar: smaller, faster, and cheaper than the flagship. The benchmarks back it up. But the price tags compared to the previous generation are drawing attention in developer communities. The "mini" tier now costs three times more per input token than its predecessor. The "nano" tier costs four times more.
Faster and more capable, yes. Cheaper than GPT-5.4? Yes. Cheap in any absolute sense? That depends who you ask.
What OpenAI Actually Shipped
GPT-5.4 mini is the headline release. It runs more than twice as fast as GPT-5 mini and posts benchmark scores that sit close to the full GPT-5.4 model across nearly every dimension.
On SWE-Bench Pro, the software engineering benchmark that tests an AI's ability to resolve real GitHub issues, mini scores 54.4% — compared to the full model's 57.7%. On OSWorld-Verified, which measures autonomous desktop computer operation, mini reaches 72.1%, just three points shy of GPT-5.4's 75.0%. Human performance on that same benchmark sits at 72.4%. Mini is effectively matching it.
The nano model is a different story. It is not trying to match the flagship. It scores 39.0% on OSWorld-Verified and 52.4% on SWE-Bench Pro. Those numbers are modest, but they represent, as OpenAI describes it, "a big step up from GPT-5 nano." The nano is built for high-volume, cost-sensitive tasks where the individual request is simple but the scale is massive.
Simon Willison, whose notes on AI releases are widely read by developers, captured the nano's value proposition with a single calculation: using GPT-5.4 nano, a developer can describe 76,000 photos for $52.
For vision-heavy pipelines processing millions of images, that matters more than benchmark leaderboard position.
The Benchmarks in Full
Both models are multimodal — they accept text, images, and screenshots, and can operate computer interfaces. The full benchmark picture:
| Benchmark | GPT-5.4 | GPT-5.4 mini | GPT-5.4 nano |
|---|---|---|---|
| SWE-Bench Pro (coding) | 57.7% | 54.4% | 52.4% |
| OSWorld-Verified (computer use) | 75.0% | 72.1% | 39.0% |
| GPQA Diamond (scientific reasoning) | 93.0% | 88.0% | 82.8% |
| Terminal-Bench 2.0 (terminal tasks) | 75.1% | 60.0% | 46.3% |
| Toolathlon (tool-calling) | 54.6% | 42.9% | 35.5% |
| MCP Atlas (multi-step tool use) | 67.2% | 57.7% | 56.1% |
| MMMUPro (vision/multimodal) | 81.2% | 76.6% | 66.1% |
The mini model holds up well in scientific reasoning and vision tasks — losing only 5 points to the full model on GPQA Diamond and just under 5 on MMMUPro. The larger gaps appear in terminal work and tool-calling, where mini drops 15 points on Terminal-Bench 2.0 and about 12 on Toolathlon.
The nano has a distinct profile: strong in reasoning relative to its cost tier (82.8% on GPQA Diamond), weaker in computer control tasks (39.0% on OSWorld-Verified). That profile maps to its intended role — fast reasoning at scale, not autonomous desktop operation.
Pricing: The Complicated Part
OpenAI released these models under the framing of "affordable, capable, fast." The absolute pricing supports that framing relative to GPT-5.4 itself. The context provided by the previous generation complicates it.
| Model | Input (per 1M tokens) | Output (per 1M tokens) |
|---|---|---|
| GPT-5.4 | $2.50 | $15.00 |
| GPT-5.4 mini | $0.75 | $4.50 |
| GPT-5.4 nano | $0.20 | $1.25 |
| GPT-5 mini (previous gen) | $0.25 | $2.00 |
| GPT-5 nano (previous gen) | $0.05 | $0.40 |
GPT-5.4 mini costs three times more per input token than GPT-5 mini. GPT-5.4 nano costs four times more per input token than GPT-5 nano. Output tokens show similar increases.
For developers who built pipelines budgeted around the previous GPT-5 mini pricing, this is not a simple upgrade — it is a repricing of the workload. Mini now costs three times more per input token than its predecessor. The capabilities are genuinely better, and for most tasks the throughput increase offsets some cost. But identical workloads cost three times more to run. Same requests, higher bill.
The comparison to current competitors is more favorable. Gemini 3 Flash costs $0.50 per million input tokens — roughly a third cheaper than GPT-5.4 mini on input, and proportionally less expensive on output. The benchmark gap, particularly on computer use and coding, is significant. Whether that gap justifies the premium depends on what the pipeline actually does.
Both models come with 400,000-token context windows, matching the full GPT-5.4 and substantially exceeding what most comparable models offered at this price tier a year ago.
The Subagent Architecture That Changes the Calculation
The pricing question looks different inside OpenAI's Codex platform, where the mini and nano models are designed to operate as subagents rather than primary models.
The pattern OpenAI describes: a full GPT-5.4 instance handles planning, task decomposition, and final evaluation. It then delegates parallel subtasks — code execution, file search, terminal operations, browser navigation — to mini or nano instances running concurrently. Each mini instance "burns only 30 percent of the GPT-5.4 quota," which means routing a subtask to mini rather than keeping it on the full model reduces that specific task's cost to roughly one-third.
This is not a new architecture — hierarchical agent systems have existed in various forms since 2024 — but the capability level of the mini model makes the delegation less painful. At 54.4% on SWE-Bench Pro, GPT-5.4 mini is capable enough to handle the majority of concrete subtasks in a software engineering workflow. The full model stays focused on what only it can do.
For developers building AI agents and tool-calling workflows, this is the practical argument for the new models: not just "cheaper than the flagship" but "cheap enough per subtask to make parallel orchestration economical."
The Counterargument: Price Trends Are Going the Wrong Way
The AI pricing story since 2023 has been consistent: every generation gets cheaper. GPT-4 Turbo, then GPT-4o, then GPT-4o mini, each dramatically undercutting the last. Developers built that assumption into their planning.
GPT-5.4 mini and nano break that trend for the mini tier.
Some developers have noted that comparing to GPT-5 mini misses the point — GPT-5.4 mini is a substantially more capable model doing substantially more complex tasks, so the price-per-capability may still be better. That is a fair argument. But it requires changing how you measure value, from price-per-token to price-per-task-completed.
For workloads where the task definition is stable — image captioning, summarization, classification — the price increase is real and not offset by capability gains that the workload can use. A pipeline describing product photos does not benefit from improved SWE-Bench scores.
There is also the competitive pressure argument. Google's Gemini 3 Flash and Flash-Lite have been aggressive on pricing. Meta's open-weight models continue to pull the price floor toward zero for teams with infrastructure. OpenAI pricing mini at $0.75/M input is a bet that the quality difference is large enough to hold the premium. For coding-heavy and computer-use workloads, that bet looks reasonable. For commodity inference tasks, it is harder to defend.
OpenAI's own statement frames this as performance-first: "GPT-5.4 mini delivers major improvements over GPT-5 mini in coding, reasoning, multimodal understanding, and tool usage." The improvements are real. Whether they justify the price increase is a business decision that varies by workload.
Availability and What Comes Next
GPT-5.4 mini is available immediately across the API, Codex, and ChatGPT. GPT-5.4 nano is API-only for now — no ChatGPT integration announced, suggesting it is primarily positioned for developer-built pipelines rather than direct consumer use.
The timeline since March 5 has moved quickly:
The GPT-5.4 full model article covers the flagship release in detail, including the computer use architecture and the OSWorld results that made it notable.
The Bottom Line
OpenAI has delivered two genuinely capable compact models. GPT-5.4 mini nearly matches the flagship on the benchmarks that matter most for coding and computer use. GPT-5.4 nano handles high-volume vision and reasoning tasks at a price that makes previously expensive pipelines feasible. The 400,000-token context window is a meaningful upgrade over what previous mini-tier models offered.
The price increase over the previous generation is real and significant. The optimistic read is that OpenAI is pricing to capability, not to maintain a consistent price-per-token curve, and that the capability jump from GPT-5 mini to GPT-5.4 mini is large enough to justify it. The pessimistic read is that OpenAI has figured out developers will pay for quality in agentic workloads — where task completion matters more than token cost — and is pricing accordingly.
For developers building AI agents and automated pipelines, the subagent architecture is the real story. A full GPT-5.4 orchestrator with mini and nano subagents handling parallel subtasks gives you near-flagship performance at roughly one-third the per-task cost. That is a compelling system design, and it is one that effectively locks the workflow into OpenAI's model family.
The question every developer should ask: does my workload benefit from the capability gains, or am I paying three times more per token for improvements I cannot use?
Sources
- OpenAI ships GPT-5.4 mini and nano, faster and more capable but up to 4x pricier — The Decoder (March 17, 2026)
- GPT-5.4 mini and GPT-5.4 nano, which can describe 76,000 photos for $52 — Simon Willison's Weblog (March 17, 2026)
- Introducing GPT-5.4 — OpenAI Blog (March 5, 2026)
- OpenAI Built a Model That Uses a Computer Better Than You Do — LetsDataScience (March 6, 2026)
- Gemini API Pricing — Google AI Developer Documentation (March 2026)
- OpenAI API Pricing — OpenAI (March 2026)
- OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments — OSWorld Project
- SWE-bench: Can Language Models Resolve Real-World GitHub Issues? — SWE-bench Project