Google Sells TPU v8 to Third-Party Data Centers

IO Fund's analysis reports that Google announced it will begin selling its TPUs to select third-party data center operators, marking an entrance into the merchant AI accelerator market. IO Fund's Beth Kindig reports Google's latest earnings showed cloud backlog up 400% year-over-year to $462 billion, and argues that a rising share of AI inference workloads is making custom silicon more economically attractive for non-hyperscaler operators. IO Fund also highlights the TPU v8 pod architecture, notably its large coherent shared memory, as a differentiator versus Nvidia GPU systems. Editorial analysis: Companies expanding custom-accelerator sales into third-party channels typically increase vendor diversity and sharpen price-performance competition for inference workloads.
What happened
IO Fund's Beth Kindig reports that Google announced it will begin selling its TPUs to select third-party data center operators, which IO Fund frames as Google entering the merchant AI accelerator market. IO Fund notes Google's latest earnings included cloud backlog growth of 400% year-over-year to $462 billion, and argues the company used the announcement to capitalise on shifting workload economics toward inference.
Technical details
Per IO Fund, the newly released TPU v8 generation and associated TPU pods emphasise a large coherent shared memory as a key architectural differentiator relative to Nvidia GPU clusters. IO Fund's writeup presents this shared-memory topology as helping TPU pods present different scaling and latency trade-offs for high-throughput inference serving.
Editorial analysis - technical context
Industry-pattern observations: As the share of AI workloads moves from expensive training runs toward high-volume inference, total cost of ownership and cost per token become primary purchasing considerations for operators. Companies that sell custom accelerators into third-party data centers can create an alternative procurement path to incumbent GPU suppliers, potentially changing price-performance negotiations and long-term capacity planning for inference.
Context and significance
IO Fund frames Google's merchant-sales announcement as coinciding with three converging dynamics-rising inference share, economic pressure on hyperscalers to monetise models, and what IO Fund calls a potential "Rubin delay" for Nvidia-that together open a window for custom silicon to gain share in inference. For practitioners, this matters because hardware diversity at the data-center level influences deployment architectures, latency envelopes, model partitioning, and inference cost optimization strategies.
What to watch
- •Reported uptake by third-party data center operators and any published performance or power-efficiency comparisons between TPU v8 pods and Nvidia GPU clusters.
- •Benchmarks showing inference cost per token and latency at scale on coherent-shared-memory TPU topologies versus GPU-based sharded approaches.
- •Any public statements or configuration guides from data-center operators describing integration, orchestration, or compatibility constraints with existing inference stacks.
Editorial analysis: Observers should treat Google's merchant-sales debut as an inflection in hardware procurement options for inference workloads rather than a guaranteed displacement of GPU incumbents. Hardware adoption at scale will depend on measured cost-per-inference, ecosystem software maturity, and operator integration work.
Scoring Rationale
Moving TPUs into third-party data centers is a notable infrastructure shift that increases hardware vendor choices for inference. This affects deployment economics and benchmarking priorities for ML practitioners and cloud operators.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems
