Security & Riskmulti turn attacksmodel safetyprompt injectioncisco

Cisco Finds Multi-Turn Attacks Break Frontier Models

|June 1, 2026|By LDS Team

7.6

Relevance Score

Cisco Finds Multi-Turn Attacks Break Frontier Models

Cisco's AI Threat Intelligence and Security Research team evaluated 15 flagship closed models from OpenAI, Anthropic, Google, Amazon, and xAI and found that multi-turn adversarial attacks produce much higher failure rates than single-prompt benchmarks. Per Cisco's report, the evaluation used 30,090 single-turn prompts and 6,986 multi-turn attacks across 1,456 conversations; single-turn attack success rates (ASR) ranged 2.19% to 64.91%, while multi-turn ASR ranged 7.89% to 88.30%. Cisco states that the two regimes yield different model rankings, failure maps, and tail-risk profiles and that every model tested exhibited non-trivial multi-turn ASR. "These benchmarks inform model cards, safety reports, and procurement decisions across the industry, but they all only measure one narrow slice of attacker behavior," Cisco researchers write. Industry reporting on the study appears in Network World and CSO Online, summarizing the same findings.

What happened

Cisco's AI Threat Intelligence and Security Research team published a paired-regime evaluation of 15 closed/proprietary frontier models from OpenAI, Anthropic, Google, Amazon, and xAI, testing a fixed adversarial corpus of 30,090 single-turn prompts and 6,986 multi-turn attacks across 1,456 conversations, per Cisco's report. The report lists the specific models tested, including GPT-5.2 and the GPT-5.4 family (OpenAI), Claude Opus 4.5/4.6, Sonnet 4.5/4.6, Haiku 4.5 (Anthropic), Gemini 3 Pro (Google), Nova Lite/Nova Micro/Nova 2 Lite (Amazon), and Grok 4.1 Fast (xAI). Cisco documents single-turn ASR from 2.19% to 64.91% and multi-turn ASR from 7.89% to 88.30%, and reports that every model showed non-trivial susceptibility under iterative attack. The report includes the explicit observation: "These benchmarks inform model cards, safety reports, and procurement decisions across the industry, but they all only measure one narrow slice of attacker behavior."

Editorial analysis - technical context

Single-turn benchmarks measure an isolated adversarial prompt and the model's immediate response. Cisco demonstrates that real-world attackers typically iterate: they decompose tasks, adopt personas, reframe refusals, and escalate across turns. Industry-pattern observations show that evaluation regimes which do not simulate iterative attacker behavior systematically miss these escalation strategies, producing optimistic ASR estimates for runtime deployments. Cisco's paired-regime methodology exposes how model state, context retention, and guardrail enforcement interact over multiple turns in ways single-shot tests cannot capture.

What Cisco reported about model differences

Per Cisco's analysis, the paired regimes produce different model orderings, different failure maps, and different tail-risk pictures. Cisco also reports a correlation in both their open-weight and closed-model studies: models with larger single-to-multi-turn ASR gaps tended to come from labs whose public communications emphasize capability advancement, while narrower gaps were more common among labs that emphasize safety publicly, according to the report.

Industry context

Industry coverage in Network World and CSO Online echoes Cisco's conclusion that many procurement decisions rely on single-turn safety scores. Reporting emphasizes that enterprises using vendor-published safety benchmarks and runtime guardrails may underestimate adversarial risk if they do not require or conduct multi-turn red-team evaluations. For practitioners, this shifts the risk calculus for model selection, runtime monitoring, and adversarial testing requirements.

For practitioners

Observed patterns in comparable evaluations indicate organizations should expand threat models to include iterative prompt-injection and stateful exploitation. Key technical indicators to instrument include how refusal logic persists across conversation turns, how chained decompositions bypass filters, and where context leakage occurs when attackers reframe requests.

What to watch

•Vendor-provided multi-turn red-team results and their methodology transparency
•Metrics that capture tail risk, e.g., multi-turn ASR and escalation success rate
•Runtime guardrail composition: stateful filters, session-level sanitization, and decomposition detection
•Public reproducibility: whether third-party auditors can replicate multi-turn failure modes

Key Points

1Cisco's paired-regime tests show multi-turn attacks raise attack success rates far above single-prompt benchmarks, revealing a systemic evaluation blind spot.
2Single-turn safety scores and model cards miss iterative escalation strategies attackers use, altering model rankings and tail-risk assessments for deployments.
3For security teams, adding multi-turn red teaming and tail-risk metrics provides a more realistic picture of operational exposure than single-prompt ASR alone.

Scoring Rationale

The study reveals a cross-vendor, systematic blind spot in widely used safety benchmarks that directly affects procurement and runtime risk assessment. The finding is technically important for security and SRE teams but not a new architectural breakthrough.

MoreCybersecurity news

Sources

Public references used for this report.

11 sources

01cisco.comHow Frontier Closed Models Collapse Under Iterative Pressure

02blogs.cisco.comProprietary Problems: No Frontier Model Is Multi-Turn Immune

03networkworld.comCisco research finds standard AI safety benchmarks miss the real threat

View 8 more sources

Practice with real Telecom & ISP data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active Residential CustomersEasy

Unlimited Fiber Plans 500Mbps+Medium

Customer Churn Risk AssessmentHard

250 free problems · No credit card

See all Telecom & ISP problems