OpenAI Releases GPT-5.5 Improving Math and Coding

OpenAI today launched GPT-5.5, a step-up LLM focused on advanced math, agentic coding, and tool use. The company offers two variants, a standard GPT-5.5 and a higher-capacity GPT-5.5 Pro, and is rolling them into ChatGPT and Codex for paid subscribers while delaying full API availability. OpenAI highlights large gains on hard benchmarks, including a 39.6% score by GPT-5.5 Pro on FrontierMath Tier 4 and 82.7% on Terminal-Bench 2.0 for the standard model. The release emphasizes improved ambiguity handling, lower token usage for coding tasks, equal per-token latency to GPT-5.4, and built-in safeguards informed by internal and external red teaming. Early use cases include optimizing OpenAI infrastructure and aiding a mathematical proof discovery.
What happened
OpenAI released GPT-5.5, shipping two variants: the baseline GPT-5.5 and the more capable, pricier GPT-5.5 Pro. The models are available now in ChatGPT and Codex for Plus, Pro, Business, and Enterprise subscribers, with API access postponed while OpenAI finalizes additional safeguards. OpenAI reports large benchmark gains: GPT-5.5 Pro scored 39.6% on FrontierMath Tier 4 and the standard GPT-5.5 achieved 82.7% on Terminal-Bench 2.0, outperforming Anthropic's Claude Opus 4.7 on many tests.
Technical details
OpenAI describes GPT-5.5 as a model optimized for agentic work: planning, tool use, and sustained multi-step tasks. Key technical characteristics called out by OpenAI include parity in serving latency with GPT-5.4 while delivering higher capability, and reduced token usage for coding workloads compared to prior versions. OpenAI also emphasizes improved interpretation of ambiguous instructions, meaning the model more often infers missing steps and required tool interactions without detailed user prompts. OpenAI disclosed targeted safety work and additional testing for advanced cybersecurity and biology capabilities, and said nearly 200 trusted partners participated in early access testing.
Notable benchmarks and real-world signals
The release includes side-by-side comparisons: GPT-5.5 Pro at 39.6% on FrontierMath Tier 4 versus Claude Opus 4.7 at 22.9%, and GPT-5.5 at 82.7% on Terminal-Bench 2.0 versus Claude at 69.4%. OpenAI reports improvements on proprietary internal measures like Expert-SWE and GDPval as well. Early production uses include optimizing OpenAI infrastructure software running on NVIDIA GB200 and GB300 NVL72 systems, and a customized GPT-5.5 instance assisting researchers to discover a new proof related to Ramsey numbers, indicating practical gains in exploratory mathematical research.
Capabilities and product posture
GPT-5.5 is presented as a bridge toward more agentic computing and a so-called super app that composes ChatGPT, Codex, and browsing capabilities. OpenAI rolled out the model to its consumer and enterprise chat products first, holding API distribution pending scaled safety validation. OpenAI also offers a premium GPT-5.5 Pro tier for higher-stakes use cases in business, legal, education, and data science.
- •Improved agentic coding and end-to-end tool chains
- •Stronger long-context reasoning and ambiguity resolution
- •Faster, more token-efficient code generation compared to GPT-5.4
Context and significance
This release continues the rapid cadence of incremental frontier-model improvements from major labs. GPT-5.5 signals two practical shifts: first, capability improvements that materially help professional workflows, especially program synthesis, system automation, and exploratory math; second, a product strategy that emphasizes controlled product rollout before broad API access, balancing adoption and risk management. The model advances the competitive dynamic with Anthropic and Google, and underlines how hardware partnerships with NVIDIA remain central for deploying larger inference stacks at scale.
What to watch
Track API availability and the exact safety guardrails OpenAI requires for external deployments, the commercial terms for GPT-5.5 Pro, and independent benchmark replications. Also watch how downstream tooling and agent frameworks update to exploit the model's improved tool use and token efficiency, and whether competitors narrow or widen the gap on hard math and system-in-the-loop coding tasks.
Bottom line
GPT-5.5 is a technically meaningful upgrade for code-centric and research workflows, shipped with explicit safety constraints and staged availability. Practitioners should evaluate its token efficiency, latency parity, and improved ambiguity handling for agentic pipelines, while monitoring API access policies and independent benchmarking results.
Scoring Rationale
This is a major frontier-model upgrade with measurable benchmark and practical gains across coding and math, and it shifts how practitioners can build agentic systems. The staged API rollout and explicit safety posture moderate immediate wide-scale integration risks, but the release materially affects the competitive landscape.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
