Security & Riskanthropicai safetycybersecurityjailbreak severity

Anthropic Proposes Cross-Industry Framework For Scoring AI Jailbreak Severity

|July 3, 2026|By LDS Team

7.1

Relevance Score

Anthropic Proposes Cross-Industry Framework For Scoring AI Jailbreak Severity

Anthropic published a Cyber Jailbreak Severity (CJS) framework on July 2, 2026, built with Amazon, Microsoft, and Google, creating a shared five-tier scale (CJS-0 to CJS-4) for rating how dangerous an AI jailbreak actually is. The framework followed a 19-day US Commerce Department export-control order that pulled Claude Fable 5 and Mythos 5 offline worldwide from June 12 to July 1, triggered after Amazon researchers found a jailbreak that got Fable 5 to flag software vulnerabilities and, in one case, write exploit-demonstration code. CJS scores four axes (capability gain, breadth, ease of weaponization, and discoverability) on an exponential scale: Anthropic's own worked example shows a jailbreak that surfaced the Log4Shell vulnerability before its 2021 disclosure would score CJS-4 (Critical), while the identical capability today, after Log4Shell became textbook knowledge, scores zero. Anthropic is soliciting feedback from academia, government, and industry, and running a HackerOne bug-bounty program for jailbreaks found in Fable 5.

The most consequential detail in Anthropic's new jailbreak-severity framework may be a definition choice: severity is scored against the tools an attacker already has, not against some fixed danger threshold, so the same model behavior can swing from Critical to zero depending on what else is publicly known at the time. That design answers the specific failure mode that took Fable 5 offline for 19 days: a single report of a jailbreak, evaluated without a shared severity vocabulary, was enough to trigger a government shutdown of a commercial AI model with no formal process for weighing how serious the underlying finding actually was.

What happened

Alongside Fable 5's global redeployment, Anthropic published a first draft of a Cyber Jailbreak Severity (CJS) scale developed jointly with Amazon, Microsoft, Google, and other partners in its Glasswing coalition. CJS rates jailbreaks CJS-0 (Informational) through CJS-4 (Critical) on four axes: capability gain (how far a technique takes an attacker beyond tools they already have), breadth of capability gain (how many distinct offensive tasks it works on), ease of weaponization (how much effort it takes to turn the technique into a working attack), and discoverability (how easily a threat actor could find it). The bands are exponential rather than linear, so Anthropic says each level represents several times more real-world risk than the one below it. Anthropic also published a companion post detailing exactly what Fable 5's cybersecurity classifiers do and do not block, and opened a HackerOne bug-bounty program specifically for cyber jailbreaks found in the model.

Timeline

June 12, 2026
The US Commerce Department invoked export-control authority to suspend Fable 5 and Mythos 5 access for foreign nationals worldwide after Amazon researchers reported a jailbreak that got Fable 5 to identify software vulnerabilities and, in one case, generate exploit-demonstration code; unable to verify user nationality in real time, Anthropic took both models offline for everyone.
June 30, 2026
Commerce lifted the export controls after Anthropic trained a new safety classifier, independently tested by the Commerce Department's Center for AI Standards and Innovation, that blocks the reported jailbreak technique in more than 99% of cases.
July 2, 2026
Fable 5 returned globally across Claude.ai, the Claude Platform, Claude Code, and Claude Cowork, and Anthropic published the CJS framework jointly with its Glasswing partners.

Technical context

Anthropic's own worked examples show how much the scale depends on timing. A hypothetical jailbreak that surfaced the Log4Shell vulnerability before its public disclosure in December 2021 scores CJS-4 (Critical) when a novice user unlocks it, because no scanner or model could find that flaw at the time; the identical jailbreak, run against a present-day codebase, scores CJS-0, because Log4Shell is now public knowledge that every scanner catches. Anthropic disputes that the Amazon-reported finding which triggered the shutdown was itself especially severe: the company says the same vulnerability identification was replicable by weaker models including Claude Opus 4.8, GPT-5.5, and Kimi K2.7, and that the specific exploit-demonstration code was reproduced by every model it tested, suggesting the jailbreak exposed no capability unique to Fable 5.

For practitioners

A shared severity vocabulary matters less as a compliance checkbox than as a coordination tool: it gives red-teamers a consistent way to escalate findings, gives labs a defensible basis for choosing between blocking access and patching quietly, and gives policymakers something more precise than dangerous or not dangerous to write rules around. For security teams evaluating multiple model vendors, CJS, if adopted beyond its four initial partners, would be the first apples-to-apples way to compare how AI labs report and respond to jailbreak disclosures, the kind of role CVSS already plays for conventional software vulnerabilities.

What to watch

CJS is explicitly a first draft: Anthropic has not set a timeline for finalizing it, has not said how partner labs will resolve disagreements when they score the same jailbreak differently, and has not named who arbitrates a dispute. Whether OpenAI, Meta, and other major labs outside the initial Glasswing group adopt or counter-propose their own standard will determine if CJS becomes an industry-wide reference the way CVSS did, or stays a four-company framework. Separately, the legal question the shutdown raised, whether the government's "deemed export" doctrine now applies to commercially hosted AI models accessed by foreign nationals over the internet, remains unresolved and could resurface with the next reported jailbreak at any lab.

Key Points

1Anthropic published a five-tier Cyber Jailbreak Severity scale with Amazon, Microsoft, and Google to standardize how AI labs rate jailbreak danger.
2The framework follows a 19-day US export-control shutdown of Claude Fable 5, triggered when Amazon reported a jailbreak the government treated as severe.
3A shared severity vocabulary could let labs, researchers, and regulators triage jailbreak disclosures consistently instead of each inventing its own risk bar.

Scoring Rationale

A genuine first attempt at a cross-lab, CVSS-style severity standard for AI jailbreaks, backed by Amazon, Microsoft, and Google, and directly motivated by a historic 19-day government-ordered shutdown of a frontier model. Verified against Anthropic's own framework document plus two independent outlets (TechTimes, AI Weekly) that corroborate the timeline, the classifier's 99%+ block rate, and Anthropic's disputed-severity counter-argument; score nudged up slightly from 7.0 given the strength and consistency of independent corroboration.

MoreAnthropic news

Sources

Public references used for this report.

3 sources

anthropic.comMore details on Fable 5's cyber safeguards and our jailbreak framework

techtimes.comClaude Fable 5 Returns Globally: New Classifier Blocks Jailbreak, Flags More Code

aiweekly.coAnthropic Redeploys Fable 5 With Cross-Lab Jailbreak Rubric

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems