New York Times Publisher Condemns AI Firms Over News Use
A.G. Sulzberger, publisher of The New York Times, used his opening keynote at the WAN-IFRA World News Media Congress on June 1, 2026 to sharply criticise major AI companies for what he called a "brazen theft of intellectual property," according to prepared remarks published by The New York Times and republished by WAN-IFRA and the Reuters Institute. Sulzberger said tech giants are "strip-mining" news websites without permission or compensation and that their products are "hijacking the public square," while noting publishers create "original, high-quality content" used to train models. He also referenced the rapid rise of ChatGPT, noting its early user growth, in describing the scale of the change. Sulzberger added that his newsroom is using AI "responsibly" and with human oversight, per the speech text.
What happened
A.G. Sulzberger, publisher and chairman of The New York Times, delivered opening remarks at the WAN-IFRA World News Media Congress in Marseille on June 1, 2026. In prepared remarks published by The New York Times and republished by WAN-IFRA and the Reuters Institute, Sulzberger said AI companies are engaging in a "brazen theft of intellectual property" and accused tech platforms of "strip-mining" news websites without permission or compensation. He said this dynamic is enabling what he called a "hijacking of the public square." The speech cites the consumer-scale uptake of ChatGPT, noting its early surge to 100 million users, as context for how quickly large language models entered mainstream use. The published remarks name major AI players, including OpenAI, Anthropic, Google, Meta, Microsoft, and X.
Editorial analysis - technical context
Industry-pattern observations: Public reporting and technical literature show that contemporary large language models typically rely on large-scale web-crawled corpora, which often include news content. Companies training generative models commonly ingest varied internet text to achieve broad language capability. News organizations' original reporting is especially useful to models because it contains factual narratives, named entities, and up-to-date information. Sulzberger's complaint highlights this technical dependence by framing news content as a targeted source of training signal for AI products.
Industry context
Editorial analysis: Sulzberger's remarks blend legal, commercial, and civic concerns. He characterises the behaviour of AI platforms as violating "settled law," a high-stakes claim that appears in his prepared remarks as republished by The New York Times and WAN-IFRA. Public coverage frames the dispute in terms of compensation, attribution, and the downstream economics of attention and advertising. Observers following the sector will note that similar disputes have produced licensing deals, publisher consortium negotiations, and litigation in prior years, according to reporting on comparable episodes between platforms and rights holders.
For practitioners
Editorial analysis: For ML engineers, data scientists, and platform teams, the most immediate implications are practical and operational. Data provenance, licensing status, and the traceability of training corpora are rising priorities across the industry. Organizations building or deploying generative systems will increasingly face pressure to document sourcing and to adopt technical controls for filtering or excluding paywalled, copyrighted, or otherwise licensed content. This pressure comes from publishers, commercial partners, and potential regulatory scrutiny documented in public reporting of publisher-platform tensions.
What to watch
For practitioners: Indicators to follow include reported litigation or formal licensing agreements between publishers and platform companies; announcements by model providers about changes to training data policies or metadata transparency; coordinated action by news publishers or trade associations; and any regulatory proposals that would clarify copyright liability for model training. Also watch product changes by major platforms that affect how news is surfaced or summarized, since those are the practical interfaces where publishers report audience and revenue impacts.
Bottom line
Editorial analysis: Sulzberger's speech crystallises a growing, well-documented conflict between news publishers and builders of generative AI. The episode reinforces a broader industry trend toward stricter scrutiny of training-data provenance and commercial terms for reused creative work. Practitioners should treat data sourcing, licensing, and auditability as operational priorities when training or deploying generative models.
Scoring Rationale
The story highlights escalating, sector-wide conflict over training data and copyright that affects many organisations building generative models. It is not a frontier-model release but carries material legal and operational implications for practitioners.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
