Defense Training Degrades Agent Tool Competence
Li et al. (arXiv, Mar 19, 2026) evaluate defense-trained LLM agents across 97 agent tasks and 1,000 adversarial prompts, finding that safety-focused defense training systematically degrades tool-use competence while failing to stop sophisticated prompt-injection attacks. They identify three biases—agent incompetence, cascade amplification, and trigger bias—and report defended models timeout on 99% of tasks versus 13% for undefended baselines, urging new defense approaches.
Key Points
- 1Reveal defense training causes immediate tool execution failures across benign multi-step agent tasks
- 2Show cascade amplification causes early failures to propagate, producing 99% timeout rate for defended agents
- 3Indicate shortcut learning undermines defenses, requiring new methods preserving tool competence under attack
Scoring Rationale
Strong, novel empirical evidence on defenses' harms; limited by single preprint source and lack of peer review.
Sources
Public references used for this report.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems

