Infrastructuregooglenvidiatpu v8inference

Google Sells TPU v8 to Third-Party Data Centers

|June 8, 2026|By LDS Team

7.3

Relevance Score

Google Sells TPU v8 to Third-Party Data Centers — Photo: static.seekingalpha.com · rights & takedowns

IO Fund's Beth Kindig reports that Google will begin selling its TPUs to select third-party data center operators, marking an entrance into the merchant AI accelerator market dominated by Nvidia. Per IO Fund, Google's latest results showed a cloud backlog of about $462 billion (Alphabet's Q1 2026 filing reported roughly $467.6 billion in total revenue backlog, largely tied to Cloud), and the firm argues rising AI inference workloads make custom silicon more attractive for non-hyperscalers. Google's eighth-generation TPUs, unveiled at Google Cloud Next 2026, split into the TPU 8t (training) and TPU 8i (inference); IO Fund highlights the pods' large coherent shared memory as a differentiator versus Nvidia GPU clusters. Expanding custom-accelerator sales into third-party channels typically increases vendor diversity and sharpens price-performance competition for inference.

What happened

IO Fund's Beth Kindig reports that Google will begin selling its TPUs to select third-party data center operators, which IO Fund frames as Google entering the merchant AI accelerator market. IO Fund notes Google's latest earnings included a cloud backlog it describes as up about 400% year over year to roughly $462 billion, and argues the company is capitalizing on shifting workload economics toward inference. Alphabet's Q1 2026 filing independently reported total revenue backlog of about $467.6 billion, largely attributed to Google Cloud.

Technical details

Google's eighth-generation TPUs, announced at Google Cloud Next 2026, split into two designs: the TPU 8t for training and the TPU 8i for inference, the first time Google has bifurcated the line. Per IO Fund, the new TPU pods emphasize a large coherent shared memory as a key architectural differentiator relative to Nvidia GPU clusters, which IO Fund presents as enabling different scaling and latency trade-offs for high-throughput inference serving.

Editorial analysis - technical context

As AI workloads shift from expensive training runs toward high-volume inference, total cost of ownership and cost per token become primary purchasing considerations for operators. A vendor that sells custom accelerators into third-party data centers creates an alternative procurement path to incumbent GPU suppliers, which can reshape price-performance negotiations and long-term capacity planning for inference.

Context and significance

IO Fund frames the merchant-sales push as coinciding with three converging dynamics: rising inference share, pressure on hyperscalers to monetize models, and what IO Fund calls a potential 'Rubin delay' for Nvidia. IO Fund argues these could together open a window for custom silicon to gain inference share, and points to signals of merchant traction such as a Google-Blackstone AI cloud joint venture targeting an initial 0.5 GW of TPU capacity in 2027 and a multi-gigawatt Anthropic TPU commitment. For practitioners, hardware diversity at the data-center level influences deployment architectures, latency envelopes, model partitioning, and inference cost optimization.

What to watch

•Reported uptake by third-party operators and any published performance or power-efficiency comparisons between TPU 8i pods and Nvidia GPU clusters.
•Benchmarks for inference cost per token and latency at scale on coherent-shared-memory TPU topologies versus GPU-based sharded approaches.
•Integration and orchestration constraints operators describe when fitting TPUs into existing inference stacks.

Google's merchant-sales debut is best read as an inflection in inference hardware procurement options rather than a guaranteed displacement of GPU incumbents; adoption at scale will depend on measured cost per inference, software maturity, and operator integration work.

Key Points

1Google's move toward merchant TPU sales expands procurement options and pressures incumbent GPU suppliers on inference cost and vendor concentration.
2The v8 generation splits into TPU 8t (training) and TPU 8i (inference); IO Fund flags large coherent shared memory as the key pod-level differentiator.
3Adoption will hinge on benchmarked cost-per-token, power efficiency, and software-ecosystem maturity, not just headline architecture claims.

Scoring Rationale

Google moving TPUs into third-party data centers is a notable infrastructure shift that expands hardware options for inference and pressures GPU incumbents, corroborated by independent reporting on the TPU 8t/8i launch and merchant deals. It materially affects deployment economics and benchmarking priorities for ML practitioners and cloud operators, though near-term impact depends on measured cost-per-inference and ecosystem maturity.

MoreGoogle AI news

Sources

Primary source and supporting public references used for this report.

4 sources

Primary sourceseekingalpha.comGoogle TPU V8 Vs. Nvidia: How Inference Is Rewriting The AI Market (NASDAQ:GOOGL)

View 3 more sources

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active Search Campaigns by BudgetEasy

High CPC Clicks & Poor Landing PagesMedium

Campaign ROAS by Attribution ModelHard

250 free problems · No credit card

See all Ad Tech problems