Google Sells TPU v8 to Third-Party Data Centers

IO Fund's Beth Kindig reports that Google will begin selling its TPUs to select third-party data center operators, marking an entrance into the merchant AI accelerator market dominated by Nvidia. Per IO Fund, Google's latest results showed a cloud backlog of about $462 billion (Alphabet's Q1 2026 filing reported roughly $467.6 billion in total revenue backlog, largely tied to Cloud), and the firm argues rising AI inference workloads make custom silicon more attractive for non-hyperscalers. Google's eighth-generation TPUs, unveiled at Google Cloud Next 2026, split into the TPU 8t (training) and TPU 8i (inference); IO Fund highlights the pods' large coherent shared memory as a differentiator versus Nvidia GPU clusters. Expanding custom-accelerator sales into third-party channels typically increases vendor diversity and sharpens price-performance competition for inference.
What happened
IO Fund's Beth Kindig reports that Google will begin selling its TPUs to select third-party data center operators, which IO Fund frames as Google entering the merchant AI accelerator market. IO Fund notes Google's latest earnings included a cloud backlog it describes as up about 400% year over year to roughly $462 billion, and argues the company is capitalizing on shifting workload economics toward inference. Alphabet's Q1 2026 filing independently reported total revenue backlog of about $467.6 billion, largely attributed to Google Cloud.
Technical details
Google's eighth-generation TPUs, announced at Google Cloud Next 2026, split into two designs: the TPU 8t for training and the TPU 8i for inference, the first time Google has bifurcated the line. Per IO Fund, the new TPU pods emphasize a large coherent shared memory as a key architectural differentiator relative to Nvidia GPU clusters, which IO Fund presents as enabling different scaling and latency trade-offs for high-throughput inference serving.
Editorial analysis - technical context
As AI workloads shift from expensive training runs toward high-volume inference, total cost of ownership and cost per token become primary purchasing considerations for operators. A vendor that sells custom accelerators into third-party data centers creates an alternative procurement path to incumbent GPU suppliers, which can reshape price-performance negotiations and long-term capacity planning for inference.
Context and significance
IO Fund frames the merchant-sales push as coinciding with three converging dynamics: rising inference share, pressure on hyperscalers to monetize models, and what IO Fund calls a potential 'Rubin delay' for Nvidia. IO Fund argues these could together open a window for custom silicon to gain inference share, and points to signals of merchant traction such as a Google-Blackstone AI cloud joint venture targeting an initial 0.5 GW of TPU capacity in 2027 and a multi-gigawatt Anthropic TPU commitment. For practitioners, hardware diversity at the data-center level influences deployment architectures, latency envelopes, model partitioning, and inference cost optimization.
What to watch
- •Reported uptake by third-party operators and any published performance or power-efficiency comparisons between TPU 8i pods and Nvidia GPU clusters.
- •Benchmarks for inference cost per token and latency at scale on coherent-shared-memory TPU topologies versus GPU-based sharded approaches.
- •Integration and orchestration constraints operators describe when fitting TPUs into existing inference stacks.
Google's merchant-sales debut is best read as an inflection in inference hardware procurement options rather than a guaranteed displacement of GPU incumbents; adoption at scale will depend on measured cost per inference, software maturity, and operator integration work.
Key Points
- 1Google's move toward merchant TPU sales expands procurement options and pressures incumbent GPU suppliers on inference cost and vendor concentration.
- 2The v8 generation splits into TPU 8t (training) and TPU 8i (inference); IO Fund flags large coherent shared memory as the key pod-level differentiator.
- 3Adoption will hinge on benchmarked cost-per-token, power efficiency, and software-ecosystem maturity, not just headline architecture claims.
Scoring Rationale
Google moving TPUs into third-party data centers is a notable infrastructure shift that expands hardware options for inference and pressures GPU incumbents, corroborated by independent reporting on the TPU 8t/8i launch and merchant deals. It materially affects deployment economics and benchmarking priorities for ML practitioners and cloud operators, though near-term impact depends on measured cost-per-inference and ecosystem maturity.
Sources
Public references used for this report.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems