GPT-5.5 Matches Mythos in Cyber Vulnerability Tests

The UK AI Security Institute (AISI) evaluated OpenAI's GPT-5.5 and found its offensive cyber capabilities roughly comparable to Anthropic's Claude Mythos Preview, according to reporting by The Decoder and Yahoo/Decrypt. AISI's tests put GPT-5.5 at an average 71.4% pass rate on the most difficult "Expert" tasks versus 68.6% for Claude Mythos, and the model completed a 32-step corporate network simulation in 2 of 10 attempts versus Mythos' 3 of 10, per The Decoder and Yahoo/Decrypt. The AISI report also noted GPT-5.5 solved a reverse-engineering challenge in 10 minutes 22 seconds at an estimated $1.73 API cost, compared with roughly 12 hours for a human, per Yahoo/Decrypt. Politico and Cybernews report OpenAI is offering a limited preview called GPT-5.5-Cyber to vetted defenders under a Trusted Access for Cyber framework. Editorial analysis: For practitioners, these results underscore that general LLM progress is producing stronger dual-use cyber capabilities alongside defensive tooling.
What happened
The UK AI Security Institute (AISI) ran a suite of offensive cybersecurity evaluations that included OpenAI's GPT-5.5, and found its performance near parity with Anthropic's Claude Mythos Preview, according to reporting by The Decoder and Yahoo/Decrypt. Per those reports, GPT-5.5 achieved an average pass rate of 71.4% on AISI's highest "Expert" tier tasks, compared with 68.6% for Claude Mythos.
The same coverage states GPT-5.5 completed AISI's 32-step corporate network simulation, described by AISI and SpecterOps, in 2 of 10 attempts while Claude Mythos completed it in 3 of 10 attempts, according to The Decoder and Yahoo/Decrypt. Yahoo/Decrypt additionally reports GPT-5.5 solved a complex reverse-engineering puzzle in 10 minutes 22 seconds at an estimated API cost of $1.73, versus approximately 12 hours for a human expert. Politico and Cybernews report OpenAI has opened a limited preview of GPT-5.5-Cyber to vetted cybersecurity professionals under a Trusted Access for Cyber program that reduces classifier refusals for legitimate defensive workflows.
Editorial analysis - technical context
Industry-pattern observations: Independent evaluations by government or third-party labs commonly show that incremental frontier-model improvements translate into outsized gains on structured, chain-of-thought and programmatic tasks. The AISI results, as reported, are consistent with that pattern: condensed reasoning, code generation, and tool-usage improvements can enable models to chain reconnaissance, exploit construction, and lateral-movement steps that previously required significant human engineering.
Context and significance
Multiple outlets frame these findings as part of a broader trend where offensive cyber capabilities emerge as a by-product of improvements in autonomy, reasoning, and coding ability in large models rather than explicit adversarial training, per The Decoder and Yahoo/Decrypt. For security teams and defenders, reported parity between GPT-5.5 and a heavily restricted model like Claude Mythos increases pressure on access controls, red-teaming practices, and incident response playbooks even as vendors attempt to enable defensive use-cases via vetting and trust frameworks.
What to watch
Observers will track:
- •AISI and other labs publishing full methodology and dataset details to enable reproducibility
- •adoption and operational rules for programs like Trusted Access for Cyber reported by Politico and Cybernews
- •vendor policy changes around classifier behavior and access gating for high-risk capabilities. Also monitor independent retests against environments with active defenses rather than the isolated networks used in these AISI scenarios
Notes on sources and public statements
Reporting by The Decoder, Yahoo/Decrypt, Politico, Cybernews, and others summarizes AISI's findings and OpenAI's limited preview rollout. Cybernews quotes OpenAI language about balancing safeguards and defender access: "We are focused on providing proportional safeguards and access to empower cyber defenders to protect society, and our approach has been informed by conversations with cybersecurity and national security leaders across federal and state government and major commercial entities," as reported by Cybernews. AISI's raw test artefacts and full report release timing remain the critical primary documents to validate methodology and scope.
Scoring Rationale
AISI's reported finding that `GPT-5.5` matches `Claude Mythos` on complex cyber tasks is a significant development for practitioners, because it demonstrates frontier model capabilities can enable complex, chained offensive workflows. The story is recent but not brand-new, so impact is notable rather than historic.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

