Policy & Regulationcopyright lawfair useai traininglegal risk

Courts Split Over AI Training Fair Use Rulings

||By LDS Team
7.0
Relevance Score
Courts Split Over AI Training Fair Use Rulings
Photo: pymnts.com · rights & takedowns

U.S. courts remain split on whether training AI models on copyrighted material is fair use, after Judge William Alsup ruled in June 2025 that training on purchased, digitized books was fair use while Judge Vince Chhabria reached the opposite conclusion two days later in a related San Francisco case. The split has taken on new urgency after Google's president of global affairs, Kent Walker, published a 21-page white paper on June 25, 2026 arguing regulation should target AI outputs rather than training inputs, a stance that clashes with Colorado's revised AI Act and California's Training Data Transparency Act, both focused on input disclosure. Publisher group Digital Content Next has sent Common Crawl a cease-and-desist letter rejecting Google's opt-out framework, arguing copyright law "is not an opt-out regime." For practitioners, the unresolved legal and regulatory divide keeps dataset provenance and licensing strategy a live risk factor.

The bigger signal for ML teams is not the year-old court split itself but that the regulatory fight has now shifted to a second front: whether governments should regulate what goes into models (training data disclosure) or what comes out of them (harmful outputs). That divide, not just the pending litigation, is what will shape dataset sourcing and licensing practice over the next year.

What happened

U.S. District Judge William Alsup ruled in June 2025 that training AI models on copyrighted books purchased and digitized for that purpose can qualify as fair use, calling it "quintessentially transformative" (PYMNTS). Two days later, U.S. District Judge Vince Chhabria, also in San Francisco, reached the opposite conclusion in a related case, warning that widespread AI training could undermine the economic incentives behind human creative work (PYMNTS; Reuters). More cases involving Anthropic, Google and Stability AI are pending in 2026. Against that backdrop, Google published a 21-page AI governance paper on June 25, 2026, arguing that "this approach would address outputs, not inputs... rather than micromanaging the science behind these new tools," and that training on publicly available web data is "a transformative, non-expressive use... that should remain protected under fair use in the U.S." Google recommends machine-readable opt-out tools like robots.txt for publishers who object. Digital Content Next rejected that framing in a cease-and-desist letter to the Common Crawl Foundation, arguing copyright law "is not an opt-out regime" (Press Gazette).

Timeline

  1. Judge William Alsup rules that training AI models on purchased, digitized copyrighted books can be fair use.

  2. Judge Vince Chhabria reaches the opposite conclusion in a related San Francisco case involving AI training.

  3. California's Training Data Transparency Act takes effect, requiring AI developers to disclose training-data sources.

  4. Google publishes a white paper arguing AI regulation should target outputs rather than training inputs.

  5. PYMNTS reports that Digital Content Next has sent Common Crawl a cease-and-desist letter over web-scraping opt-outs.

Regulatory context

California and the EU have taken an input-focused approach, requiring disclosure of training-data sources, collection periods, and copyright status. Colorado moved the other way, revising its AI Act in May 2026 to focus on real-world consequential decisions rather than the underlying technology, per a Mintz analysis cited by PYMNTS. Google's paper explicitly sides with the output-based camp, and separately proposes a federally-overseen, industry-funded body (which it calls FARO) to set safety standards for frontier models.

For practitioners

The Alsup/Chhabria split has not been resolved by appeal or settled into binding precedent, so dataset provenance, retention practices, and documented transformation remain the concrete factors courts have weighed. Teams should not treat Google's fair-use argument or the output-based regulatory framing as settled law; California and Colorado's disclosure obligations already apply regardless of how the fair-use question is eventually resolved, and licensing exposure differs by jurisdiction.

What to watch

Whether appellate courts or additional district rulings in the pending Anthropic, Google, and Stability AI cases narrow or widen the Alsup/Chhabria split; how regulators respond to Google's FARO proposal and output-based framing; and whether the Digital Content Next-Common Crawl dispute produces a test case on whether robots.txt-style opt-outs are legally sufficient.

Editorial analysis

The pattern here is a familiar one in emerging-tech regulation: dominant players that already operate at scale tend to favor output-based, harms-focused rules that are lighter on upfront compliance, while critics warn that structural concentration in compute, data, and distribution needs input-side scrutiny to be addressed at all. Neither framework has won yet, and practitioners should expect both fronts to keep moving in parallel.

Key Points

  • 1A year-old split between Judges Alsup and Chhabria on AI-training fair use remains unresolved as more cases proceed in 2026.
  • 2Google's June 2026 governance paper argues for output-based AI regulation, clashing with California and Colorado's input-disclosure laws.
  • 3Publishers are contesting Google's opt-out framework via a cease-and-desist to Common Crawl, keeping dataset licensing a live compliance risk.

Scoring Rationale

The underlying court split is a year old and unresolved, but Google's new output-based governance paper and the Colorado/California input-disclosure divide give it fresh regulatory relevance for 2026 dataset and licensing decisions. Notable operational impact for ML teams, though not yet a settled precedent or industry-shaking ruling.

Sources

Public references used for this report.

7 sources

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

250 free problems · No credit card

See all Ad Tech problems