Researchllmmodel memorizationcopyrightmodel auditing

Researchers Extract Copyrighted Books From Production Language Models

|January 9, 2026|By LDS Team

9.0

Relevance Score

Researchers Extract Copyrighted Books From Production Language Models — Photo: regmedia.co.uk · rights & takedowns

In a preprint and following a 90-day disclosure window that ended Dec. 9, 2025, Stanford and Yale researchers showed they could extract large portions of copyrighted books from production LLMs including Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3, with recall rates up to 95.8 percent for Harry Potter. The results show memorization can bypass vendor guardrails and may affect ongoing copyright litigation and vendor mitigation strategies.

Key Points

1Extracts substantial copyrighted text from production LLMs including Claude, GPT-4.1, Gemini, and Grok.
2Demonstrates that memorization persists despite vendor guardrails, raising fair-use and copyright litigation concerns.
3Urgently requires model audits, stricter training-data transparency, and improved mitigation to reduce legal and privacy risks.

Scoring Rationale

High cross-industry relevance and actionable findings, limited by reliance on a single preprint and pending peer review.

MoreAI Privacy news

Sources

Public references used for this report.

1 source

01theregister.comBoffins probe commercial AI models, find Harry Potter

Practice with real Logistics & Shipping data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

High-Value Overnight OrdersEasy

Delivered International ShipmentsMedium

On-Time Delivery Rate by CarrierHard

250 free problems · No credit card

See all Logistics & Shipping problems

Researchllmmodel memorizationcopyrightmodel auditing

Researchers Extract Copyrighted Books From Production Language Models

|January 9, 2026|By LDS Team

9.0

Relevance Score

Key Points

1Extracts substantial copyrighted text from production LLMs including Claude, GPT-4.1, Gemini, and Grok.
2Demonstrates that memorization persists despite vendor guardrails, raising fair-use and copyright litigation concerns.
3Urgently requires model audits, stricter training-data transparency, and improved mitigation to reduce legal and privacy risks.

Scoring Rationale

High cross-industry relevance and actionable findings, limited by reliance on a single preprint and pending peer review.

MoreAI Privacy news

Sources

Public references used for this report.

1 source

01theregister.comBoffins probe commercial AI models, find Harry Potter

Practice with real Logistics & Shipping data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

High-Value Overnight OrdersEasy

Delivered International ShipmentsMedium

On-Time Delivery Rate by CarrierHard

250 free problems · No credit card

See all Logistics & Shipping problems

Researchers Extract Copyrighted Books From Production Language Models

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Zig Bans AI-Generated Contributions, Raises Tradeoffs

Researchers Propose Online Safety Monitoring For LLMs

Investors Seek Shelter in India Amid AI Storm

PACE Estimates Agent Scores From Proxy Benchmarks

Researchers Extract Copyrighted Books From Production Language Models

Key Points

Scoring Rationale

Sources

More AI & Data Science News

Zig Bans AI-Generated Contributions, Raises Tradeoffs

Researchers Propose Online Safety Monitoring For LLMs

Investors Seek Shelter in India Amid AI Storm

PACE Estimates Agent Scores From Proxy Benchmarks