Policy & Regulationmetallamacopyrightpublishers

Publishers Sue Meta Over Llama Copyright Training

||By LDS Team
7.1
Relevance Score
Publishers Sue Meta Over Llama Copyright Training
Photo: pyxis.nymag.com · rights & takedowns

Five major publishers - Elsevier, Cengage, Hachette, Macmillan, and McGraw Hill - plus author Scott Turow filed a proposed class action in Manhattan federal court on May 5, 2026, alleging Meta Platforms used millions of their books and journal articles without permission to train its Llama large-language models, Reuters and The Next Web report. The complaint asks the court to certify a class of similarly situated rights holders and seeks monetary damages, according to Reuters. Reuters also reports a Meta spokesperson said, "We will fight this lawsuit aggressively." The suit follows earlier litigation, including the 2023 Kadrey case that led to Judge Vince Chhabria granting summary judgment for Meta in June 2025; The Next Web notes Chhabria limited that ruling and invited stronger market-harm evidence from future plaintiffs.

What happened

Five major publishers, Elsevier, Cengage, Hachette, Macmillan, and McGraw Hill, along with author Scott Turow, filed a proposed class action against Meta Platforms in Manhattan federal court on May 5, 2026, alleging the company used millions of copyrighted books and journal articles without permission to train its Llama large-language models, Reuters and The Next Web report. The complaint seeks class certification and unspecified monetary damages, Reuters reports. Reuters quotes a Meta spokesperson: "We will fight this lawsuit aggressively." The Next Web frames the filing as the latest, and substantively different, entry in a broader set of AI-training copyright cases.

Technical details

The complaint alleges training data included pirated copies of textbooks, scientific articles, and novels; Reuters lists examples including The Fifth Season by N.K. Jemisin and The Wild Robot by Peter Brown. The Next Web recounts facts from earlier litigation (the Kadrey case), noting that that case established Meta employees had torrented roughly 82 terabytes of pirated material and that internal documents showed executives discussed use of pirate libraries such as LibGen, Z-Library, and Anna's Archive. The Next Web also notes that Judge Vince Chhabria granted summary judgment for Meta in June 2025 on fair-use grounds while describing the ruling as narrow and conditional.

Industry context

Editorial analysis: Courts and plaintiffs are now testing fair-use defenses beyond the narrow holdings in earlier rulings. Reuters places this suit within a larger wave of litigation in which "dozens of authors, news outlets, visual artists and other plaintiffs" have sued AI companies, including Meta, OpenAI, and Anthropic, over training data. Industry reporting indicates plaintiffs are increasingly using class-action procedures and market-harm theories to try to distinguish new complaints from prior fair-use wins.

For practitioners

Editorial analysis: Companies building or deploying large models should treat training-data provenance and licensing as operational risk vectors. Comparable legal disputes have focused on evidence of market harm, the provenance of large scraped datasets, and whether model outputs are sufficiently "transformative." Organizations operating data pipelines, model registries, and compliance programs will face closer scrutiny of ingestion sources and retention of provenance metadata in any future discovery process.

What to watch

  • Whether the Manhattan court grants class certification, which Reuters notes is sought in the complaint.
  • Whether plaintiffs present new evidence of market harm or different facts from the Kadrey record; The Next Web highlights Judge Chhabria's invitation for stronger market-harm evidence.
  • Any settlement signals or damages awards in similar cases, and how courts refine the legal standard for using copyrighted material in model training.

Bottom line

Editorial analysis: This filing escalates a sector-wide legal contest over training-data sourcing and fair use. Practitioners should monitor litigation outcomes and evidence-tracing expectations because court rulings could materially affect data procurement, model auditing, and compliance practices.

Key Points

  • 1Major publishers filed a proposed class action claiming Meta trained Llama on millions of copyrighted works, raising class-certification stakes.
  • 2Reporting shows prior rulings were narrow, and plaintiffs now emphasize market-harm evidence to distinguish earlier fair-use decisions.
  • 3For practitioners, provenance and licensing in training pipelines are becoming critical legal risk controls for model development and deployment.

Scoring Rationale

This is a notable legal escalation involving major publishers and a flagship model, with potential to shift jurisprudence on training-data fair use. Outcomes could materially affect data sourcing and compliance for AI practitioners.

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

250 free problems · No credit card

See all Ad Tech problems