Researchers Find Poetry Circumvents Chatbot Safety

Italy’s Icaro Lab researchers at Sapienza University and DexAI published a non-peer-reviewed study showing poetic prompts can bypass chatbot safety. They tested 20 handcrafted poems against 25 models from Google, OpenAI, Meta, xAI, and Anthropic, finding an average 62% success rate and model-specific results ranging 0%–100%; a generated-poem attacker succeeded 43% on a larger corpus. The study signals urgent need to harden safety detection against stylistic adversarial attacks.
Key Points
- 1Demonstrates poetic prompts bypass chatbot safety in 62% of handcrafted tests across 25 commercial models.
- 2Highlights that stylistic variation can evade filters, revealing systemic vulnerabilities across major vendors and model sizes.
- 3Implies practitioners must strengthen detection and safety pipelines to address adversarial poetic inputs and model robustness.
Scoring Rationale
High practical urgency due to broad, measurable vulnerabilities across major models; limited by a single non-peer-reviewed study.
Sources
Public references used for this report.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems


