OpenAI Releases GPT-5.5 With Expanded Cybersecurity Safeguards
OpenAI released GPT-5.5, a retrained base model built for agentic coding, multi-step workflows, and research tasks, while introducing its tightest safeguards to date. The model is rolling out to Plus, Pro, Business, and Enterprise users in ChatGPT and Codex, with GPT-5.5 Pro available to higher tiers and API access coming soon. OpenAI rates GPT-5.5 as High for cybersecurity capability under its Preparedness Framework, meaning it can amplify existing pathways to severe harm but does not meet the company threshold for a "Critical" rating. GPT-5.5 matches GPT-5.4 per-token latency while improving performance and token efficiency on coding benchmarks, including Terminal-Bench 2.0 (OpenAI: 82.7% vs GPT-5.4 75.1%). The release signals a competitive response to Anthropic's latest models and tightens the industry focus on balancing capability gains with targeted safety controls.
What happened
OpenAI released GPT-5.5, a fully retrained base model positioned for complex, real-world work including coding, software operation, data analysis, and multi-step research tasks. The company says GPT-5.5 delivers higher intelligence while matching GPT-5.4 per-token latency and using significantly fewer tokens on Codex tasks. The rollout targets Plus, Pro, Business, and Enterprise ChatGPT and Codex users, with GPT-5.5 Pro available to higher tiers and API availability planned soon.
Technical details
GPT-5.5 is described as more agentic and efficient, able to plan across tools, check its work, and continue through ambiguity with less human direction. OpenAI highlighted benchmark gains, including Terminal-Bench 2.0 performance of 82.7% versus GPT-5.4 at 75.1%, and improved outcomes on internal coding evaluations like Expert-SWE. The release notes emphasize latency parity with GPT-5.4 despite higher capability and reduced token consumption for equivalent Codex tasks. OpenAI paired the model with an expanded safety stack: the GPT-5.5 System Card, targeted red-team testing, and early feedback from nearly 200 trusted partners.
- •Capabilities that matter in practice include: writing and debugging code, operating software and toolchains, researching online, creating documents and spreadsheets, and sustaining multi-step workflows.
Safeguards and risk classification
OpenAI places GPT-5.5 in the High category for cybersecurity, biological, and chemical capability under its Preparedness Framework. That classification signals the model can amplify existing pathways to severe harm but, per OpenAI, does not cross the internal threshold for a "Critical" cybersecurity risk. The company says it ran expanded adversarial testing, added targeted cybersecurity and bio capability checks, and iterated with internal and external redteamers before release. OpenAI also notes that API deployments require additional, partner-specific safeguards before wide external access.
Context and significance
GPT-5.5 arrives amid an accelerated arms race among frontier model providers. Anthropic recently released advanced models emphasizing safety and cybersecurity capability, prompting rapid follow-on work from OpenAI. The combination of higher agentic capability, token efficiency, and matched latency is notable for enterprise adoption: it reduces compute and latency penalties that often accompany capability jumps. The explicit High risk classification is also significant because it publicly acknowledges stronger misuse potential while showing how a major provider is operationalizing safety gating alongside product rollout.
What to watch
Monitor how OpenAI operationalizes the additional safeguards for API customers and enterprise deployments, and watch benchmark head-to-heads with Anthropic and Google as independent evaluations appear. Security teams should reassess threat models and testing around code generation and automated tool use given GPT-5.5 capabilities.
Scoring Rationale
This is a major frontier model release that advances agentic capabilities and efficiency while adding the strongest safeguards from OpenAI so far. It reshapes enterprise and security tradeoffs and tightens competition with Anthropic, making it highly relevant to practitioners.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
