Researchers Reveal LLMs Memorize Training Books

On Tuesday, researchers at Stanford and Yale revealed that four popular large language models—OpenAI’s GPT, Anthropic’s Claude, Google’s Gemini, and xAI’s Grok—can store and reproduce large portions of books they were trained on. Claude produced near-complete texts of Harry Potter and several classics, illustrating memorization and lossy-compression behavior. The finding contradicts company claims and raises substantial copyright liability that could cost the industry billions.
Key Points
- 1Demonstrate models reproduce long book excerpts, including near-complete Harry Potter and classic novels
- 2Undermine industry claims, showing models store training data via lossy-compression-like behavior
- 3Create major copyright and legal risks for companies, potentially costing billions and product removals
Scoring Rationale
Strong novelty and industry-wide legal impact from credible Stanford/Yale research, though limited to thirteen tested books and models.
Sources
Public references used for this report.
Practice with real Logistics & Shipping data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Logistics & Shipping problems