Anthropic's Mythos Raises Cybersecurity Risk Questions
Anthropic unveiled Claude Mythos Preview, a model the New York Times reports the company restricted to a small set of organisations because of its ability to find and weaponize software vulnerabilities. Independent evaluations by the United Kingdom's AI Security Institute (AISI) and testing by Palo Alto Networks found Claude Mythos Preview and OpenAI's GPT-5.5 have substantially outpaced previous trends in autonomous cyber capability, with AISI reporting Mythos completed a 32-step simulated attack in 6 of 10 attempts and GPT-5.5 in 3 of 10, according to Cyberscoop. Reuters reports the Pentagon is using Mythos under Anthropic's Project Glasswing to patch vulnerabilities while also planning to wean federal systems off Anthropic products. Reporting in The Parliament Magazine and other outlets notes limited EU access and a debate among security experts and policymakers about controlled distribution versus wider research access.
What happened
Anthropic announced the preview of Claude Mythos Preview and, according to The New York Times, limited access to a small group of organisations rather than a public release. The New York Times reports Anthropic shared the technology with about 40 organisations, including Microsoft, Apple, Amazon Web Services, CrowdStrike and JPMorgan Chase. Reuters reports Anthropic operates Project Glasswing, a controlled initiative that permits select organisations to run the unreleased model for defensive hardening. Reuters also reports the U.S. Department of Defense (DoD) is deploying Mythos for vulnerability scanning while simultaneously moving to transition away from Anthropic products and that the DoD has labelled Anthropic a supply-chain risk after the two sides failed to reach an agreement.
The UK's AI Security Institute (AISI) and Palo Alto Networks published testing results reported by Cyberscoop that place Claude Mythos Preview and OpenAI's GPT-5.5 well ahead of prior capability trends. The AISI reported that Mythos completed its previously unsolved "Cooling Tower" range in multiple trials and solved the 32-step "The Last Ones" scenario in 6 of 10 attempts, while GPT-5.5 solved that same scenario in 3 of 10 attempts, per Cyberscoop. Multiple outlets including Reuters and The Parliament Magazine report that access has been primarily US-centric, leaving some EU actors without preview access.
Editorial analysis - technical context
Independent evaluations cited in reporting show a discrete capability jump in autonomous cyber tasks for frontier models. Industry measurements referenced by AISI indicate the time horizon for tasks that models can perform autonomously has been shortening rapidly; Cyberscoop reports AISI found performance improvements that outpaced the institute's recent doubling trend. For practitioners, this means red-team and blue-team cycles that previously took weeks may now be compressing to days or hours when aided by advanced models. These tests focused on structured cyber-range scenarios, which provide controlled, repeatable benchmarks but do not capture all aspects of real-world network complexity.
Industry context
Editorial analysis: Reporting places Claude Mythos Preview in a broader pattern where frontier models are producing outsized capability jumps that spur government attention and vendor countermeasures. Reuters documents the DoD's dual posture - using Mythos for defensive patching while seeking to reduce reliance on the vendor - illustrating how national security stakeholders are treating frontier-model access as both an operational asset and a procurement risk. Other vendors are responding: press coverage notes OpenAI and other firms outlining competing cyber-capability offerings, and enterprises including banks and browser projects are reported to be urgently patching issues surfaced by Mythos, per Reuters and Mozilla's reporting.
What to watch
- •Adoption and governance: observers will track whether governments formalise pre-deployment review processes for high-risk models; The New York Times reports the White House is considering an executive order to create an AI working group.
- •Benchmark replication: AISI and Palo Alto Networks results should be reproducible or contested; independent reproductions will clarify whether reported capability jumps are model-specific or represent a broader inflection.
- •Access and disclosure practices: reporting in The Parliament Magazine highlights geographic asymmetries in preview access, which may affect cross-border resilience and disclosure timelines.
Editorial analysis: For practitioners, the immediate operational implication is practical: organisations that maintain critical infrastructure should assume frontier models can materially accelerate both discovery and exploitation of vulnerabilities, while research and policy communities will need to balance controlled access against the benefits of broader evaluation and defensive hardening efforts.
Scoring Rationale
Frontier-model capability claims and independent tests reported by reputable outlets represent a notable inflection in autonomous cybersecurity capability; the story prompted government action and vendor responses, making it highly relevant to practitioners.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

