Project Glasswing's first month produced 23,019 vulnerability findings across 1,000 open-source projects, with a 90.6% true positive rate. One flaw in the wolfSSL library would have let attackers forge certificates on roughly 5 billion devices. Patching, not finding, is now the bottleneck.

The wolfSSL cryptography library is embedded in roughly five billion devices. Routers, satellites, automotive systems, industrial control gear, and the smart locks on suburban front doors all depend on it.

On April 8, 2026, the wolfSSL team shipped version 5.9.1 to patch a critical flaw, later catalogued as CVE-2026-5194 and given a base CVSS score of 9.3 by the National Vulnerability Database. The vulnerability would have let an attacker forge digital certificates, host a perfectly legitimate-looking fake banking website, and impersonate any service that relies on the library.

The vulnerability was found by an AI.

Anthropic disclosed the wolfSSL finding on May 22, as part of its first formal update on Project Glasswing, the initiative the company launched in April to put its Claude Mythos Preview model in the hands of approximately 50 enterprise and infrastructure partners. The update reads less like a research report and more like a warning siren.

In its first month, Mythos Preview and the Glasswing partners found more than 10,000 high- or critical-severity vulnerabilities in software that underpins the internet.

Anthropic separately scanned over 1,000 open-source projects with the model, flagging 23,019 issues in total. Of the 1,752 high- or critical-rated findings that have been independently triaged so far, 90.6% turned out to be valid true positives.

The bottleneck in software security has officially shifted. Finding bugs is no longer the hard part. Fixing them is.

The wolfSSL Discovery Is the One Practitioners Should Read Twice

Cryptographers spend years auditing libraries like wolfSSL. The library has been a standard for embedded systems for two decades. It runs in the firmware that protects bank transactions, military communications, vehicle telematics, and connected industrial equipment. Per a public technical writeup, the library is shipped on roughly five billion devices.

Mythos Preview identified a flaw in how wolfSSL validated ECDSA signatures: the library accepted hash digests that were smaller than what the algorithm required. By exploiting that gap, an attacker could craft a forged digital certificate that vulnerable wolfSSL clients would accept as legitimate. From there, the attacker could stand up a malicious server that looked indistinguishable from a real bank or email provider.

"[Mythos Preview] constructed an exploit that would let an attacker forge certificates that would (for instance) allow them to host a fake website for a bank or email provider. The website would look perfectly legitimate to an end user, despite being controlled by the attacker."

— Anthropic, in its May 22 Project Glasswing update

CVE-2026-5194 was assigned by NVD and the patched version (5.9.1) has been available since April 8. Anthropic has not yet released its full technical writeup; it says the analysis will publish in the coming weeks, once enough end users have updated.

The Open-Source Numbers Are Even More Striking

Mythos Preview has so far scanned 1,000-plus open-source projects, including those that Anthropic itself relies on. It returned 23,019 issues. Of these, the model classified 6,202 as high- or critical-severity on first pass.

Six independent security research firms (and Anthropic itself in a small number of cases) triaged 1,752 of those critical-tier findings. The results:

90.6% (1,587) were validated as real, exploitable vulnerabilities
62.4% (1,094) were confirmed at high- or critical-severity after expert review
Projected total at current rates: nearly 3,900 high- or critical-severity vulnerabilities in open-source code alone

Mozilla, scanning Firefox with Mythos Preview ahead of the version 150 release, found and fixed 271 vulnerabilities. That was more than ten times the count it found in Firefox 148 using Claude Opus 4.6, the prior generation of Anthropic's model.

Cloudflare reported finding 2,000 bugs in its critical-path systems, of which 400 were high- or critical-severity. Cloudflare's security team told Anthropic that the false positive rate from Mythos was better than human testers' on the same code.

The UK's AI Security Institute, an independent government evaluator, reported that Mythos Preview was the first AI model to solve both of its cyber ranges, which simulate multistep cyberattacks, end-to-end. XBOW, a security platform, called the model a "significant step up over all existing models" and said it provides "absolutely unprecedented precision" on a token-for-token basis.

Patching Has Become the Real Bottleneck

The flip side of the discovery firehose is visible in the patch numbers.

Of the 530 high- or critical-severity bugs Anthropic and its partners have disclosed to maintainers so far, only 75 have been patched, and 65 have public security advisories. Another 827 confirmed vulnerabilities are queued for disclosure as fast as the disclosure pipeline can move.

"Several maintainers have told us they're currently severely capacity constrained, and some have even asked us to slow down our rate of our disclosures because they need more time to design patches."

— Anthropic, May 22, 2026

The math is brutal for maintainers. Anthropic puts the average high- or critical-severity patch cycle at two weeks. With reports landing in batches of hundreds, the queue grows faster than volunteer-led projects can clear it.

Some commercial vendors are scaling their responses. Palo Alto Networks said its most recent release contained five times more patches than normal. Microsoft told customers that Patch Tuesday is going to keep getting larger "for some time." Oracle is now finding and fixing vulnerabilities across its products and cloud "multiple times faster" than before.

But for the maintainers of small, critical open-source libraries, the kind that underpin a billion devices and are kept up by a single person on weekends, the same firehose looks like a denial-of-service attack on their inbox.

The Other Side: AI Bug Reports Have a Bad Reputation

The honest counterargument is that the open-source community has been drowning in bad AI-generated security reports for over a year. Through 2024 and 2025, well-meaning users and bug bounty hunters submitted thousands of reports generated by general-purpose chatbots that hallucinated CVEs, misread code, or misrepresented severity. The Curl maintainer Daniel Stenberg publicly complained about the trend; the Python Software Foundation issued guidelines on AI-assisted submissions.

Anthropic's Glasswing process is different from those bad submissions in important ways. Reports are pre-triaged by independent firms. The 90.6% true-positive rate, if it holds across thousands more reports, is materially better than what most human-submitted bug bounties achieve. And the average severity is high. These are not theoretical buffer overflows; they are exploitable certificate forgeries and remote code execution paths.

But the perception problem persists. A maintainer staring at a hundred AI-generated reports has no easy way to tell which were written by Mythos at Anthropic and which were generated by a free chatbot. Anthropic's own update acknowledges this: it notes that some maintainers have asked for reports to be paused entirely. The infrastructure to verify AI-found bugs has not caught up with the infrastructure to find them.

What Practitioners Should Take From This

For data scientists, ML engineers, and security teams, three takeaways matter.

First, the bottleneck has moved. If your team's security process assumes that vulnerability discovery is the expensive step, that assumption is now wrong for the kind of bugs Mythos finds. The expensive step is verification and patching. Internal security backlogs need to be rebuilt around triage capacity.

Second, the model behind Mythos Preview will be commoditized. Anthropic has been explicit that "models as capable as Mythos Preview will soon be developed by many different AI companies." That means attackers will eventually get similar capabilities. The race is to close the patch deployment gap before they do. Google caught the first AI-built zero-day in the wild earlier this month, and the exploit code was already self-generating.

Third, public AI models can already do meaningful security scanning. Anthropic's separately released Claude Security tool, currently in public beta for Claude Enterprise customers, has been used to patch over 2,100 vulnerabilities in three weeks with Claude Opus 4.7, the generally available production model, not Mythos. Teams that have not started running their own codebases through a security-tuned model are leaving discoverable bugs on the table that an attacker may find first.

Recent supply chain attacks on PyTorch Lightning and poisoned VS Code extensions have shown how fast the offensive side moves. Mythos's numbers show how fast the defensive side can move now that AI is in the loop.

The Bottom Line

A month ago, Project Glasswing looked like a clever positioning move: hand the most capable security AI to your biggest enterprise partners, generate goodwill, and quietly demonstrate Mythos's capabilities to the US government and key infrastructure players.

A month later, it looks like a stress test of the entire open-source security ecosystem. Mythos found more than ten thousand serious bugs across the software that runs the internet. It found a flaw in a cryptography library that secures roughly five billion devices. And it produced disclosures faster than human maintainers can patch them.

Anthropic's own framing in the update was unusually plain for a corporate blog: "no company — including Anthropic — has developed safeguards strong enough to prevent such models from being misused and potentially causing severe harm."

The good case is that the patch infrastructure catches up, and the world ends up with vastly safer software. The bad case is that an adversary gets a Mythos-class model before the patches deploy, and the same precision that protected wolfSSL gets pointed the other way.

The window between those two cases is now measured in the same two weeks it takes to patch one critical bug.