Publishers Sue Meta Over Llama Copyright Training

Five major publishers - Elsevier, Cengage, Hachette, Macmillan, and McGraw Hill - plus author Scott Turow filed a proposed class action in Manhattan federal court on May 5, 2026, alleging Meta Platforms used millions of their books and journal articles without permission to train its Llama large-language models, Reuters and The Next Web report. The complaint asks the court to certify a class of similarly situated rights holders and seeks monetary damages, according to Reuters. Reuters also reports a Meta spokesperson said, "We will fight this lawsuit aggressively." The suit follows earlier litigation, including the 2023 Kadrey case that led to Judge Vince Chhabria granting summary judgment for Meta in June 2025; The Next Web notes Chhabria limited that ruling and invited stronger market-harm evidence from future plaintiffs.
What happened
Five major publishers, Elsevier, Cengage, Hachette, Macmillan, and McGraw Hill, along with author Scott Turow, filed a proposed class action against Meta Platforms in Manhattan federal court on May 5, 2026, alleging the company used millions of copyrighted books and journal articles without permission to train its Llama large-language models, Reuters and The Next Web report. The complaint seeks class certification and unspecified monetary damages, Reuters reports. Reuters quotes a Meta spokesperson: "We will fight this lawsuit aggressively." The Next Web frames the filing as the latest, and substantively different, entry in a broader set of AI-training copyright cases.
Technical details
The complaint alleges training data included pirated copies of textbooks, scientific articles, and novels; Reuters lists examples including The Fifth Season by N.K. Jemisin and The Wild Robot by Peter Brown. The Next Web recounts facts from earlier litigation (the Kadrey case), noting that that case established Meta employees had torrented roughly 82 terabytes of pirated material and that internal documents showed executives discussed use of pirate libraries such as LibGen, Z-Library, and Anna's Archive. The Next Web also notes that Judge Vince Chhabria granted summary judgment for Meta in June 2025 on fair-use grounds while describing the ruling as narrow and conditional.
Industry context
Editorial analysis: Courts and plaintiffs are now testing fair-use defenses beyond the narrow holdings in earlier rulings. Reuters places this suit within a larger wave of litigation in which "dozens of authors, news outlets, visual artists and other plaintiffs" have sued AI companies, including Meta, OpenAI, and Anthropic, over training data. Industry reporting indicates plaintiffs are increasingly using class-action procedures and market-harm theories to try to distinguish new complaints from prior fair-use wins.
For practitioners
Editorial analysis: Companies building or deploying large models should treat training-data provenance and licensing as operational risk vectors. Comparable legal disputes have focused on evidence of market harm, the provenance of large scraped datasets, and whether model outputs are sufficiently "transformative." Organizations operating data pipelines, model registries, and compliance programs will face closer scrutiny of ingestion sources and retention of provenance metadata in any future discovery process.
What to watch
- •Whether the Manhattan court grants class certification, which Reuters notes is sought in the complaint.
- •Whether plaintiffs present new evidence of market harm or different facts from the Kadrey record; The Next Web highlights Judge Chhabria's invitation for stronger market-harm evidence.
- •Any settlement signals or damages awards in similar cases, and how courts refine the legal standard for using copyrighted material in model training.
Bottom line
Editorial analysis: This filing escalates a sector-wide legal contest over training-data sourcing and fair use. Practitioners should monitor litigation outcomes and evidence-tracing expectations because court rulings could materially affect data procurement, model auditing, and compliance practices.
Scoring Rationale
This is a notable legal escalation involving major publishers and a flagship model, with potential to shift jurisprudence on training-data fair use. Outcomes could materially affect data sourcing and compliance for AI practitioners.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems

