Researchers Demonstrate LLM Training Poisoning Triggers Gibberish

Researchers at Anthropic, the UK AI Security Institute and the Alan Turing Institute report new experiments showing that inserting just 250 carefully crafted 'poison' training documents can backdoor large language models to output gibberish when triggered by a specific phrase. Tests across models from 600 million to 13 billion parameters used the trigger word 'sudo', demonstrating parts-per-million vulnerability with implications for dataset hygiene and model provenance.
Key Points
- 1Show vulnerability in LLMs using just 250 poisoned training documents to trigger output corruption.
- 2Demonstrate attacks at parts-per-million scale across models from 600M to 13B parameters.
- 3Warn practitioners to verify outputs and harden data pipelines, ingestion, and provenance checks.
Scoring Rationale
Strong empirical demonstration of low-cost poisoning across LLM scales, enabling urgent defenses; limited to gibberish backdoor scenario.
Sources
Public references used for this report.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems