Security & Riskred teamingopen sourceai securitymultimodal llms

Open-source Framework Automates AI Agent Red-teaming

|April 23, 2026|By LDS Team

7.8

Relevance Score

Open-source Framework Automates AI Agent Red-teaming

LangWatch released Scenario, an open-source framework that automates multi-turn red-team exercises against production AI agents. Scenario shifts testing from single-shot prompt probes to adversarial, multi-step attack campaigns that simulate real-world attacker behavior against chatbots, data agents, and multimodal systems. Complementary work from academic and vendor projects, including the arXiv release of OpenRT and vendor tooling in Microsoft Foundry and Anthropic initiatives, shows red-teaming is moving to high-throughput, metric-driven workflows. Practitioners should prioritize integrating automated red-teaming into CI/CD, track Attack Success Rate (ASR) metrics, and instrument model endpoints and connectors to detect multi-turn exfiltration and logic-jailbreak patterns before deployment.

What happened

LangWatch released Scenario, an open-source framework that runs automated, multi-turn red-team exercises against production AI agents, shifting testing from single-shot prompts to adversarial campaign simulations. In parallel, academic work released as OpenRT and vendor programs from Microsoft Foundry and Anthropic emphasize high-throughput, modular red-teaming. The academic study reports testing 20 advanced models and a peak Attack Success Rate of 49.14%, highlighting real gaps in deployed models such as GPT-5.2, Claude 4.5, and Gemini 3 Pro.

Technical details

Scenario and related projects replace single-prompt probes with orchestrated, stateful attack traces that emulate how attackers iterate. The new architectures commonly use a modular adversarial kernel that separates:

•model integration and adapters
•dataset management and corpus generation
•attack strategy engines
•judging and scoring modules
•evaluation metrics and reporting

OpenRT formalizes that separation and couples it to a high-throughput asynchronous runtime. Research implementations integrate 37 attack methodologies, including white-box gradient-guided perturbations, multimodal pixel and layout perturbations, and multi-agent evolutionary strategies that spawn and mutate attack chains. Key practitioner-facing features you should expect or look for:

•automated generation of multi-turn jailbreak traces and context-aware prompt injections
•standardized scoring such as Attack Success Rate (ASR) and per-category risk breakdowns
•plug-in model adapters and sandboxed execution to test both API endpoints and agentic workflows
•CI/CD hooks and logging outputs to integrate tests into deployment pipelines

Context and significance

Production AI agents increasingly touch sensitive systems and data stores, so multi-turn tactics that combine subtle prompt injection, context poisoning, and staged exfiltration become the realistic threat model. Vendor efforts like the AI Red Teaming Agent in Microsoft Foundry and Anthropic's Project Glasswing demonstrate two converging trends: defenders want automated, repeatable tooling; attackers will leverage model capabilities to find software vulnerabilities. The empirical finding that leading models still show substantial ASR under complex attack mixes means model capability alone is not a substitute for systematic safety testing. For MLOps and security teams, red-teaming is evolving from ad hoc exercises into engineering-grade tests with metrics, reproducibility, and lifecycle integration.

What to watch

Adoption vectors include embedding red-team runs into pre-deploy CI/CD, standardization of metrics like ASR across vendors, and supply-chain pressure for third-party attestations. Also watch defensive responses: stateful context sanitization, runtime policy enforcement, and hardened agent orchestration. The near-term risk is a capability arms race where automated red-team tooling accelerates both offensive discovery and defensive patching.

Key Points

1Multi-turn, stateful red-teaming uncovers vulnerabilities single-shot tests miss, driving a shift in security testing practices for deployed agents.
2Modular frameworks standardize attack, judging, and metrics, enabling high-throughput ASR measurement and repeatable CI/CD integration for safety gating.
3Empirical results show leading models still fail complex attacks, so defenders must pair capability upgrades with systematic red-teaming and runtime controls.

Scoring Rationale

Open-source, modular red-teaming frameworks change how teams test production agents and bridge research and operational tooling. The score reflects broad practical impact for MLOps and security teams without representing a paradigm shift.

MoreAI Evals news

Sources

Public references used for this report.

9 sources

01arxiv.orgAn Open-Source Red Teaming Framework for Multimodal LLMs - arXiv

02learn.microsoft.comAI Red Teaming Agent - Microsoft Foundry

03trydeepteam.comDeepTeam by Confident AI - The LLM Red Teaming Framework

View 6 more sources

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems