Cloudflare and GoDaddy Shield Websites from AI Crawlers
Cloudflare and GoDaddy announced a partnership to give website operators new controls over AI crawlers from large AI firms, framing the effort as a fix for the web’s “broken business model” in the AI era. The collaboration aims to let site owners decide how automated AI agents access, index, or reuse their content — a direct response to large-scale scraping by major AI companies. For practitioners, this changes the negotiation around training data access, web-scale crawling ethics, and infrastructure-level bot management. Expect product integrations that surface crawl policies and enforcement at the edge, shifting some control back to publishers and site operators.
What happened
Cloudflare and GoDaddy have partnered to give websites more control over automated AI crawlers operated by major AI players, positioning the effort as a remedy to the web’s “broken business model” in the AI era. The stated intent is to let site owners exercise greater authority over how AI bots access and reuse their content.
Technical context
Large language models and other generative systems rely on vast volumes of web-scraped data. That flow of content from publishers into training datasets has created tension: publishers claim loss of traffic, monetization, and control; AI companies rely on broad crawl access to build models. Edge providers and registrars sit at the intersection of traffic management and site ownership, so a partnership between an edge/network platform and a major registrar/hosting ecosystem directly targets enforcement and policy capability at scale.
Key details from sources
The announcement frames the initiative as addressing the web’s broken business model in the AI era by giving sites control over AI crawlers. Cloudflare and GoDaddy will work together to provide mechanisms for websites to manage how AI agents crawl and consume content. The coverage centers on the collaboration as a response to large-scale automated crawling by prominent AI firms.
Why practitioners should care
This partnership influences two operational domains for ML and site teams. For data engineers and ML researchers, it signals potential friction around lawful, automated web data collection: expect more granular access controls and opt-in/opt-out mechanics at the HTTP and DNS layers. For SREs and security engineers, it means integrating bot-detection and policy enforcement into deployment and crawling pipelines. Organizations that rely on public web scraping for training or evaluation must track these controls to avoid unexpected data availability changes and to adjust data procurement and compliance processes.
What to watch
Watch for product announcements detailing how controls are expressed (e.g., headers, robots-related signals, API-based permissions), rollout timelines across GoDaddy-hosted sites, and whether major AI firms negotiate standardized access or new licensing flows. Also monitor legal and industry responses from publishers and AI companies.
Scoring Rationale
The partnership matters for practitioners because it directly affects web data access and bot-management at scale, which are core to data collection and infrastructure. It's not a fundamental AI research breakthrough, but it changes operational realities for ML teams and publishers.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems

