The bigger signal for ML teams is not the year-old court split itself but that the regulatory fight has now shifted to a second front: whether governments should regulate what goes into models (training data disclosure) or what comes out of them (harmful outputs). That divide, not just the pending litigation, is what will shape dataset sourcing and licensing practice over the next year.
What happened
U.S. District Judge William Alsup ruled in June 2025 that training AI models on copyrighted books purchased and digitized for that purpose can qualify as fair use, calling it "quintessentially transformative" (PYMNTS). Two days later, U.S. District Judge Vince Chhabria, also in San Francisco, reached the opposite conclusion in a related case, warning that widespread AI training could undermine the economic incentives behind human creative work (PYMNTS; Reuters). More cases involving Anthropic, Google and Stability AI are pending in 2026. Against that backdrop, Google published a 21-page AI governance paper on June 25, 2026, arguing that "this approach would address outputs, not inputs... rather than micromanaging the science behind these new tools," and that training on publicly available web data is "a transformative, non-expressive use... that should remain protected under fair use in the U.S." Google recommends machine-readable opt-out tools like robots.txt for publishers who object. Digital Content Next rejected that framing in a cease-and-desist letter to the Common Crawl Foundation, arguing copyright law "is not an opt-out regime" (Press Gazette).
Timeline
Judge William Alsup rules that training AI models on purchased, digitized copyrighted books can be fair use.
Judge Vince Chhabria reaches the opposite conclusion in a related San Francisco case involving AI training.
California's Training Data Transparency Act takes effect, requiring AI developers to disclose training-data sources.
Google publishes a white paper arguing AI regulation should target outputs rather than training inputs.
PYMNTS reports that Digital Content Next has sent Common Crawl a cease-and-desist letter over web-scraping opt-outs.
Regulatory context
California and the EU have taken an input-focused approach, requiring disclosure of training-data sources, collection periods, and copyright status. Colorado moved the other way, revising its AI Act in May 2026 to focus on real-world consequential decisions rather than the underlying technology, per a Mintz analysis cited by PYMNTS. Google's paper explicitly sides with the output-based camp, and separately proposes a federally-overseen, industry-funded body (which it calls FARO) to set safety standards for frontier models.
For practitioners
The Alsup/Chhabria split has not been resolved by appeal or settled into binding precedent, so dataset provenance, retention practices, and documented transformation remain the concrete factors courts have weighed. Teams should not treat Google's fair-use argument or the output-based regulatory framing as settled law; California and Colorado's disclosure obligations already apply regardless of how the fair-use question is eventually resolved, and licensing exposure differs by jurisdiction.
What to watch
Whether appellate courts or additional district rulings in the pending Anthropic, Google, and Stability AI cases narrow or widen the Alsup/Chhabria split; how regulators respond to Google's FARO proposal and output-based framing; and whether the Digital Content Next-Common Crawl dispute produces a test case on whether robots.txt-style opt-outs are legally sufficient.
Editorial analysis
The pattern here is a familiar one in emerging-tech regulation: dominant players that already operate at scale tend to favor output-based, harms-focused rules that are lighter on upfront compliance, while critics warn that structural concentration in compute, data, and distribution needs input-side scrutiny to be addressed at all. Neither framework has won yet, and practitioners should expect both fronts to keep moving in parallel.
Key Points
- 1A year-old split between Judges Alsup and Chhabria on AI-training fair use remains unresolved as more cases proceed in 2026.
- 2Google's June 2026 governance paper argues for output-based AI regulation, clashing with California and Colorado's input-disclosure laws.
- 3Publishers are contesting Google's opt-out framework via a cease-and-desist to Common Crawl, keeping dataset licensing a live compliance risk.
Scoring Rationale
The underlying court split is a year old and unresolved, but Google's new output-based governance paper and the Colorado/California input-disclosure divide give it fresh regulatory relevance for 2026 dataset and licensing decisions. Notable operational impact for ML teams, though not yet a settled precedent or industry-shaking ruling.
Sources
Public references used for this report.
View 4 more sources
- 04AI training is 'fair use' federal judge rules in Anthropic copyright case - Fortunefortune.com
- 05US judge sides with Meta in AI training copyright case - France 24france24.com
- 06A pragmatic approach to AI governance in America - Googleblog.google
- 07AI Companies Prevail in Path-Breaking Decisions on Fair Use - Crowellcrowell.com
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems

