Claude Opus Generates Chrome V8 Exploit via API

A security researcher at Hacktron used Anthropic's Claude Opus to build a working Chrome V8 exploit chain against an outdated Chromium bundle in Discord. The experiment consumed 2.3 billion tokens, cost $2,283 in API fees and about 20 hours of human guidance. The researcher supplied known V8 patch data and iteratively guided Opus 4.6 through debugging dead ends until the exploit executed, popping the calculator app as proof of code execution. The result underscores that mainstream code-generation models, not only closed high-risk variants, are now capable of practical exploit development when paired with persistent operators. The immediate takeaway for practitioners is that patch lag in Electron-based apps and slow update rollouts materially increase exposure to AI-accelerated weaponization.
What happened
A researcher at Hacktron used Anthropic's frontier model Claude Opus to develop a working exploit chain against the V8 JavaScript engine bundled in an outdated Chromium inside Discord. The exercise consumed 2.3 billion tokens, cost $2,283 in API usage, and required about 20 hours of operator intervention before the exploit executed, demonstrated by popping the calculator application. The target used a known out-of-bounds (OOB) V8 bug from Chrome 146 while the bundled Chromium was 138, nine major versions behind upstream.
Technical details
The researcher guided Opus 4.6 across multiple sessions, feeding it patch diffs and debugging output, and intervened when the model reached dead ends. On first mention, Opus 4.7 is characterized as roughly similar to Opus 4.6 in cyber capabilities, with Anthropic stating added safeguards in later builds and in Mythos. Key technical points for practitioners:
- •The model translated patch descriptions and public CVE information into actionable exploitation steps, lowering the time and skill barrier for proof of concept creation.
- •Attack development was iterative: the model proposed strategies, produced exploit drafts, and required human feedback for debugging and pivoting.
- •The experiment targeted an Electron-style app with bundled Chromium, highlighting the practical risk of version lag.
Context and significance
This demonstration validates concerns that advanced code-generation models materially accelerate the conversion of disclosed patches into working exploits. Historically, reverse-engineering patches to build exploits required specialist skills and substantial time. Now, token-economics plus an operator can compress that effort dramatically. The finding is significant for three overlapping reasons. First, many widely deployed desktop and cross-platform apps like Discord, Teams, Notion, and Slack commonly ship older Chromium builds, creating a rich attack surface. Second, the experiment shows that even models not explicitly labeled high-risk, when available via API and combined with human operators, can produce weaponizable code. Third, vendor vulnerability reward markets create real incentives for both legitimate and criminal buyers, so exploitability is not merely academic.
What to watch
Security teams must prioritize reducing patch lag, harden Electron-based deployments, and treat AI-assisted exploitation as a realistic threat vector. Monitoring for suspicious token spending patterns, enforcing stricter API usage policies, and accelerating coordinated disclosure and patch deployment are immediate operational steps. As the researcher put it, "Whether Mythos is overhyped or not doesn't matter," said Mohan Pedhapati of Hacktron, "the curve isn't flattening. If not Mythos, then the next version, or the one after that." Expect successive model iterations to require progressively less human scaffolding.
Implications for defenders
Shift from assuming human-only exploit development to preparing for AI-assisted adversaries. Prioritize minimal Chromium versions, instrument runtime protections, and adopt rapid deployment pipelines for hotfixes. For defenders evaluating risk, this is a practical wake-up call: model access plus patience and budget now substitute for deep exploitation expertise.
Scoring Rationale
This is a major, practitioner-relevant demonstration that mainstream models materially lower the bar for exploit creation; it forces changes in defensive posture and software update practices. The score deducts 0.5 for same-week freshness considerations.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.


