What happened
Axios reports that Palo Alto Networks scanned more than 130 products with frontier AI models, including Claude Mythos and OpenAI cyber-focused models, and identified 75 vulnerabilities that have been patched, up from a typical 5-10 monthly finding rate, according to Axios. The Register similarly reports the company used Anthropic's Mythos among other models to scan its codebase and disclosed 26 CVEs as part of the findings. Axios adds that none of the vulnerabilities identified by Palo Alto Networks were observed to be exploited in the wild.
SecurityWeek reports that Microsoft's MDASH system, which orchestrates more than 100 specialized agents across frontier and distilled models, discovered 16 of the vulnerabilities fixed in the latest Patch Tuesday release; Microsoft said four of those were rated critical. ComputerWeekly reports that Microsoft's April Patch Tuesday update overall contained over 160 distinct flaws and that industry discussion has linked the spike to wider use of AI tools such as Anthropic's Project Glasswing and Claude Mythos.
Technical details
SecurityWeek describes MDASH as a multi-stage, agentic pipeline that runs preparation, scanning, validation, deduplication, and proof-construction agents; Microsoft reported that in internal tests MDASH recovered 96% and 100% of confirmed vulnerabilities on two heavily audited components and performed strongly on the public CyberGym benchmark, per SecurityWeek. Axios reports Palo Alto Networks told reporters that in internal testing the frontier models generated working exploits more than 70% of the time and that the company observed an average false-positive rate of about 30%.
Industry context
ComputerWeekly reports Anthropic has framed Project Glasswing (and a Mythos preview) as capable of discovering and, in some cases, developing exploits for zero-day flaws, prompting tight access controls around the tools. VentureBeat and other reporting earlier this year documented incidents where adversaries compromised AI security tools to exfiltrate data; VentureBeat warned those compromises could escalate once agentic tools gain write access to infrastructure. The Register quotes Dustin Childs of Trend Micro's Zero Day Initiative saying the immediate effect of AI-driven discovery will be more patches and more work for administrators, and that patch trust may erode if fixes break production.
Editorial analysis - technical context
Industry observers note that frontier models are improving at multi-step reasoning and chaining discrete findings into exploit paths, which raises the volume and severity of candidate findings. Companies that have tested these systems report higher recall for hidden bugs but also nontrivial false-positive rates and a need for human validation. For practitioners, that combination increases the importance of robust triage pipelines: vulnerability deduplication, exploitability validation, regression testing, and staged deployment workflows.
What to watch
Watch for wider adoption of guarded access programs (the kinds of controls ComputerWeekly describes around Project Glasswing) and for vendor disclosures that quantify false-positive rates and exploitability proofs. Observers should also follow whether disclosure volumes remain elevated as defenders scale human-in-the-loop validation, and whether patch deployment rates lag behind discovery rates, a gap that sources warn could increase operational risk.
Bottom line
Axios, The Register, SecurityWeek, ComputerWeekly, and VentureBeat document a clear increase in vulnerability discoveries after applying frontier AI models to vendor code. The technical gains in discovery come with operational friction: higher triage load, nontrivial false positives, and questions about secure handling of agentic tools, per the cited reporting.
Key Points
- 1Palo Alto Networks used frontier models to scan 130+ products and found 75 vulnerabilities, highlighting higher discovery volume.
- 2Microsoft's MDASH multi-agent pipeline found 16 Patch Tuesday flaws and showed strong recovery on benchmark tests.
- 3Industry observers note improved exploit-chaining by models increases triage burdens and amplifies the need for validation pipelines.
Scoring Rationale
Frontier AI materially changes vulnerability discovery capacity and exploit-generation ability; practitioners must reassess tooling and triage workflows. The story affects many security teams and vendor disclosure volumes, making it industry-significant but not a single historic event.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


