Defense Training Degrades Agent Tool Competence

Li et al. (arXiv, Mar 19, 2026) evaluate defense-trained LLM agents across 97 agent tasks and 1,000 adversarial prompts, finding that safety-focused defense training systematically degrades tool-use competence while failing to stop sophisticated prompt-injection attacks. They identify three biases—agent incompetence, cascade amplification, and trigger bias—and report defended models timeout on 99% of tasks versus 13% for undefended baselines, urging new defense approaches.
Scoring Rationale
Strong, novel empirical evidence on defenses' harms; limited by single preprint source and lack of peer review.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems

