Researchers Extract Copyrighted Books From Production Language Models

In a preprint and following a 90-day disclosure window that ended Dec. 9, 2025, Stanford and Yale researchers showed they could extract large portions of copyrighted books from production LLMs including Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3, with recall rates up to 95.8 percent for Harry Potter. The results show memorization can bypass vendor guardrails and may affect ongoing copyright litigation and vendor mitigation strategies.
Key Points
- 1Extracts substantial copyrighted text from production LLMs including Claude, GPT-4.1, Gemini, and Grok.
- 2Demonstrates that memorization persists despite vendor guardrails, raising fair-use and copyright litigation concerns.
- 3Urgently requires model audits, stricter training-data transparency, and improved mitigation to reduce legal and privacy risks.
Scoring Rationale
High cross-industry relevance and actionable findings, limited by reliance on a single preprint and pending peer review.
Sources
Public references used for this report.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems