On Monday morning, John Hultquist published the report that ended a years-long debate inside the cybersecurity industry.
The chief analyst at Google's Threat Intelligence Group had been sitting on a case file his team was still piecing together. A criminal crew had assembled what looked like a routine pre-attack kit: a two-factor authentication bypass for a popular open-source web-based administration tool, the kind of platform that sits between system administrators and tens of thousands of corporate networks. The plan, GTIG concluded, was a mass-exploitation campaign. The kind of operation that ends with ransomware notes on hospital screens and extortion demands on a finance team's inbox.
What made the file different was the code itself. The Python script the criminals had written contained tells no human author would leave behind. The docstrings read like passages from a security textbook. The Common Vulnerability Scoring System number embedded in the exploit was hallucinated, a confident value attached to a vulnerability that did not yet have one. The structure was clean in a way real exploit code almost never is.
Google's analysts traced the work back to a large language model. Not Gemini. Not Anthropic's Claude Mythos. Something else, something the company would not name. But the fingerprints were there.
"It's here," Hultquist told reporters. "The era of AI-driven vulnerability and exploitation is already here."
The Bug Was Real, the Exploit Was Polished, the Author Was Not Human
The vulnerability itself was the kind of mistake that costs companies enterprise contracts. Developers of the unnamed administration tool had hard-coded a trust exception into their authentication flow, a shortcut that let certain requests skip the two-factor check. In a normal review cycle, that exception would have been caught by a senior engineer or flagged by a static analysis pass. It was not.
GTIG's report describes the kind of flaw that has historically eluded automated tools. "While fuzzers and static analysis tools are optimized to detect sinks and crashes, frontier LLMs excel at identifying these types of high-level flaws and hardcoded static anomalies," the report reads. Logic mistakes, in other words, are exactly the sort of thing modern AI is starting to get surprisingly good at finding.
Google declined to name the affected vendor, citing ongoing remediation. It also declined to identify the criminal group, saying only that it had no evidence the operators were tied to an adversarial government. That detail matters. State-backed actors typically work slowly, carefully, and at a scale that makes individual operators hard to trace. Criminal hackers do not. They scale aggressively. They reuse infrastructure. And, as Hultquist put it, they have the most to gain from speed.
"There's a race between you and them to stop them before they can essentially get whatever data they need to extort you with, or launch ransomware," Hultquist said in an interview. "AI is going to be a huge advantage because they can move a lot faster."
Google alerted the unnamed vendor and quietly worked on a patch before the campaign could properly start. GTIG believes the disruption may have prevented the operation from gaining any traction. The vendor has since pushed the fix. The criminals, presumably, have moved on to other targets.
The Hallucination That Saved Everyone
The strangest detail in the GTIG report is also the most reassuring: the AI-built exploit was not quite finished.
According to Google's analysis, mistakes in the exploit's implementation probably interfered with the criminals' plans this time around. The model had hallucinated the CVSS score. It had left those educational docstrings in the code, the kind of "this function performs X by doing Y" annotations that show up in tutorial repositories but never in operational malware. The polish of the textbook coding structure was, paradoxically, evidence that no experienced human had touched the file.
That clumsiness is the early-phase signature of a technology that is still learning the shape of its new job. It is also the reason Google's defenders were able to identify the work for what it was.
"There's a misconception that the AI vulnerability race is imminent," Hultquist said. "The reality is that it's already begun. For every zero-day we can trace back to AI, there are probably many more out there."
That line is the one cybersecurity leaders will be quoting for the rest of 2026. It carries an admission: GTIG caught one. The ones GTIG did not catch are operating against unknown targets, with unknown effectiveness, on unknown timelines.
A Wider Pattern Was Already Visible
The zero-day case is the most dramatic finding in GTIG's report, but it is not the only one. Google's analysts describe a broader pattern in which state actors and criminal groups alike are integrating AI into the operational core of their work.
The North Korean crew tracked as APT45 has been using AI to churn through thousands of exploit checks and bulk out its toolkit. Chinese state-linked operators are experimenting with AI systems for vulnerability hunting and automated probing of targets. Malware families have started shipping with AI-generated junk code padded into the body of the payload, an obfuscation technique designed to slow down human analysts. Android backdoors have been spotted calling Gemini APIs to autonomously move through infected devices. Russian influence operations have begun stitching fabricated AI-generated audio into legitimate news footage.
The picture GTIG draws is one in which AI is no longer a phishing aid or a chatbot trick. It is part of the offensive stack.
The Defensive Side Is Already Spending
The defensive side has not been sitting still. Anthropic restricted Mythos to a small group of trusted partners after warning that the model was "strikingly capable" at hacking and cybersecurity work. It then organized an initiative called Project Glasswing that brought Amazon, Apple, Google, Microsoft, and JPMorgan Chase together to harden critical software before the model's broader release.
OpenAI followed weeks later. On Friday, May 8, the company released its own defensive cyber model, a specialized ChatGPT variant restricted to defenders responsible for securing critical infrastructure. Mozilla credited the Mythos preview the same week with helping to patch 423 Firefox security bugs.
The bet is straightforward: if AI can find vulnerabilities, AI must also be deployed to find and patch them first. Hultquist's report is the first hard evidence that the bet is no longer optional.
For practitioners shipping LLM-integrated tools, the GTIG case carries an additional warning. The attackers were not abusing the AI's intended capability. They were using a general-purpose coding model the way any engineer might. The line between "model that writes code" and "model that writes exploits" turns out to be thin, contextual, and difficult to enforce.
The Other Side: Skeptics Argue the Threat Is Overstated
Not everyone reads the GTIG report the same way.
Some security researchers point out that the criminals' exploit failed to launch cleanly, that the AI's mistakes were the very reason the operation was caught, and that frontier LLMs remain expensive, slow, and prone to hallucination in adversarial workflows. The mass-exploitation campaign GTIG describes never produced a victim list. It produced a polished but flawed proof-of-concept that Google's defenders unspooled in time to patch.
Others note that the volume of zero-days discovered by humans dwarfs any plausible AI contribution today. Independent bug bounty researchers, internal Google Project Zero, Trail of Bits, and dozens of corporate red teams still find most reported vulnerabilities. AI, in this view, accelerates a niche of the work, not the whole pipeline.
There is also a regulatory critique. Dean Ball, a senior fellow at the Foundation for American Innovation who served as a White House tech policy adviser and was a lead author of Trump's AI policy roadmap, is one of the few conservatives on record asking for federal action on AI in cybersecurity.
"Some people don't want there to be a regulatory response to this and others do," Ball said. "I don't like regulation. I would prefer for things not to be regulated. But I think we need to in this case."
Ball is optimistic that, over the long run, AI tools good at coding will make routine cyberattacks against hospitals and schools easier to defend against. The problem is the interval. There are, by his estimate, "untold trillions of lines of software code" supporting the world's computing systems, and AI tools unleashed to exploit all of them can cause years of damage before defenders can harden everything. Ball predicts a "transitional period" where cybersecurity risks rise significantly and "the world might actually be more dangerous."
The Bottom Line
A criminal crew used a language model to find a real bug in a real production tool. The model wrote real exploit code that targeted a real two-factor authentication bypass. The operation was meant to scale into a mass-exploitation event. Google caught it because the AI made beginner mistakes. That is the entire story, and it is enough.
The implication for everyone shipping software is uncomfortable. The barrier to a working zero-day used to be expertise: years of reverse engineering, fuzzing, dynamic analysis, and intuition. The model that wrote this exploit is not a senior researcher. It is a tool a small criminal team rented for a price most ransomware crews already pay for cloud compute. The next iteration will leave fewer fingerprints. The iteration after that will not need the criminals to be skilled at all.
For senior engineers at companies that ship anything resembling an administration tool, an authentication flow, or an open-source library used at scale, the to-do list just got longer. Audit the trust exceptions. Pressure-test logic flaws. Assume that a model somewhere is reading your repository the way an experienced attacker would. The defensive side has its own AI. The race is whether it gets deployed before the offensive side does.
"For every zero-day we can trace back to AI, there are probably many more out there," Hultquist said. He published one. He kept the rest of the file closed.
Sources
- Google says criminals used AI-built zero-day in planned mass hack spree (The Register, May 11, 2026)
- 'It's here': Google issues dire warning after catching hackers using AI to break into computers (Fortune / AP, May 11, 2026)
- Google Says Hacker Used Mythos-Like AI for Software Tool Exploit (Bloomberg, May 11, 2026)
- Google disrupts hackers using AI to exploit an unknown weakness in a company's digital defense (NBC News / AP, May 11, 2026)
- Google says it likely thwarted effort by hacker group to use AI for 'mass exploitation event' (CNBC, May 11, 2026)
- Japan's PM orders cybersecurity review to stop Mythos going full CyberZilla (The Register, May 12, 2026)
- Mozilla says AI helped squash 423 Firefox security bugs (The Register, May 8, 2026)