Cisco Finds Multi-Turn Attacks Break Frontier Models

Cisco's AI Threat Intelligence and Security Research team evaluated 15 flagship closed models from OpenAI, Anthropic, Google, Amazon, and xAI and found that multi-turn adversarial attacks produce much higher failure rates than single-prompt benchmarks. Per Cisco's report, the evaluation used 30,090 single-turn prompts and 6,986 multi-turn attacks across 1,456 conversations; single-turn attack success rates (ASR) ranged 2.19% to 64.91%, while multi-turn ASR ranged 7.89% to 88.30%. Cisco states that the two regimes yield different model rankings, failure maps, and tail-risk profiles and that every model tested exhibited non-trivial multi-turn ASR. "These benchmarks inform model cards, safety reports, and procurement decisions across the industry, but they all only measure one narrow slice of attacker behavior," Cisco researchers write. Industry reporting on the study appears in Network World and CSO Online, summarizing the same findings.
What happened
Cisco's AI Threat Intelligence and Security Research team published a paired-regime evaluation of 15 closed/proprietary frontier models from OpenAI, Anthropic, Google, Amazon, and xAI, testing a fixed adversarial corpus of 30,090 single-turn prompts and 6,986 multi-turn attacks across 1,456 conversations, per Cisco's report. The report lists the specific models tested, including GPT-5.2 and the GPT-5.4 family (OpenAI), Claude Opus 4.5/4.6, Sonnet 4.5/4.6, Haiku 4.5 (Anthropic), Gemini 3 Pro (Google), Nova Lite/Nova Micro/Nova 2 Lite (Amazon), and Grok 4.1 Fast (xAI). Cisco documents single-turn ASR from 2.19% to 64.91% and multi-turn ASR from 7.89% to 88.30%, and reports that every model showed non-trivial susceptibility under iterative attack. The report includes the explicit observation: "These benchmarks inform model cards, safety reports, and procurement decisions across the industry, but they all only measure one narrow slice of attacker behavior."
Editorial analysis - technical context
Single-turn benchmarks measure an isolated adversarial prompt and the model's immediate response. Cisco demonstrates that real-world attackers typically iterate: they decompose tasks, adopt personas, reframe refusals, and escalate across turns. Industry-pattern observations show that evaluation regimes which do not simulate iterative attacker behavior systematically miss these escalation strategies, producing optimistic ASR estimates for runtime deployments. Cisco's paired-regime methodology exposes how model state, context retention, and guardrail enforcement interact over multiple turns in ways single-shot tests cannot capture.
What Cisco reported about model differences
Per Cisco's analysis, the paired regimes produce different model orderings, different failure maps, and different tail-risk pictures. Cisco also reports a correlation in both their open-weight and closed-model studies: models with larger single-to-multi-turn ASR gaps tended to come from labs whose public communications emphasize capability advancement, while narrower gaps were more common among labs that emphasize safety publicly, according to the report.
Industry context
Industry coverage in Network World and CSO Online echoes Cisco's conclusion that many procurement decisions rely on single-turn safety scores. Reporting emphasizes that enterprises using vendor-published safety benchmarks and runtime guardrails may underestimate adversarial risk if they do not require or conduct multi-turn red-team evaluations. For practitioners, this shifts the risk calculus for model selection, runtime monitoring, and adversarial testing requirements.
For practitioners
Observed patterns in comparable evaluations indicate organizations should expand threat models to include iterative prompt-injection and stateful exploitation. Key technical indicators to instrument include how refusal logic persists across conversation turns, how chained decompositions bypass filters, and where context leakage occurs when attackers reframe requests.
What to watch
- •Vendor-provided multi-turn red-team results and their methodology transparency
- •Metrics that capture tail risk, e.g., multi-turn ASR and escalation success rate
- •Runtime guardrail composition: stateful filters, session-level sanitization, and decomposition detection
- •Public reproducibility: whether third-party auditors can replicate multi-turn failure modes
Scoring Rationale
The study reveals a cross-vendor, systematic blind spot in widely used safety benchmarks that directly affects procurement and runtime risk assessment. The finding is technically important for security and SRE teams but not a new architectural breakthrough.
Practice with real Telecom & ISP data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Telecom & ISP problems

