Infrastructuregooglemarvelltpuinference

Google Expands TPU Supply Chain With Marvell Partnership

|April 19, 2026

7.1

Relevance Score

Google Expands TPU Supply Chain With Marvell Partnership — Photo: cdn.wccftech.com · rights & takedowns

Google is in talks with Marvell Technology to co-develop two custom chips: a memory processing unit (MPU) designed to pair with existing TPUs, and a new TPU optimized for inference. The move would add a third design partner alongside Broadcom and MediaTek, reflecting a strategy of supply-chain diversification and cost-targeted silicon for inference workloads. The MPU aims to shift memory-intensive operations off the accelerator die, reducing bandwidth pressure and improving latency for serving models. The inference-focused TPU would target the dominant, continuously scaling cost of serving models, allowing Google to segment designs by performance, cost, and power efficiency.

What happened

Google is in talks with Marvell Technology to develop two custom AI chips: a memory processing unit and a new inference-optimized TPU. The discussions, not yet a signed contract, would add Marvell as a third design services partner alongside Broadcom and MediaTek, while fabrication is expected to remain with TSMC. The plan targets inference-first operating economics as Google ramps production of Ironwood, its seventh-generation TPU (TPU v7).

Technical details

The first chip is a dedicated memory processing unit (MPU) intended to pair with TPUs and offload memory-centric work through in-memory processing techniques. Offloading can reduce host-to-accelerator bandwidth and on-die memory pressure by performing prefetching, compression, activation quantization, gather/scatter, and other data-movement operations in a proximity device. The second chip is a next-generation TPU designed specifically for inference, complementing rather than replacing Ironwood.

•`Ironwood` / `TPU v7` is already positioned as an inference-era accelerator, scaling to 9,216 liquid-cooled chips per superpod and delivering very large FP8 exa-scale throughput; earlier public figures cite peak performance and high-bandwidth memory configurations (for example, 192 GB HBM and multi-petaflop-class arithmetic on preceding v7 variants).
•The MPU+TPU split mirrors broader system design patterns where a smaller, cheaper companion ASIC handles memory-system functions, enabling the main accelerator to optimize for matrix compute density and energy efficiency.

Context and significance

The AI compute market is shifting from episodic, training-centric capacity to continuous, inference-driven costs that scale with usage. Google has signaled that inference will dominate its compute spend; adding a design partner focused on inference-optimized and cost-tiered silicon allows more aggressive product segmentation. This follows Broadcom's long-term agreement with Google through 2031 for high-performance TPU variants, and MediaTek's role in cost-optimized e variants at lower price points. The custom ASIC market is expanding rapidly, with projections showing robust growth; this deal, if finalized, would accelerate the trend of multi-vendor design stacks for hyperscale AI.

Why it matters for practitioners

Splitting memory functions into an MPU can materially change system-level performance trade-offs. For ML engineers and infrastructure teams, the implications are lower latency for serving, higher effective model size per dollar, and a new set of performance knobs around memory tiering, compression, and sparsity handling. For hardware teams, the MPU approach increases emphasis on interposer, packaging, and NIC/networking co-design to preserve throughput and maintain low tail latency under load.

What to watch

Contract finalization and timelines; whether Marvell's role is full-chip design, IP block integration, or design services; silicon process node choices and packaging strategy; and the cost and power targets for the inference TPU versus Ironwood. Also watch software support: runtime, compiler, and graph-partitioning changes needed to offload memory ops to an MPU and preserve model accuracy and determinism.

Bottom line

This is a pragmatic supply-chain and microarchitecture move that signals Google doubling down on inference economics. If executed, MPU+TPU pairings will broaden the hardware choices for serving large models and push other hyperscalers and vendors to adopt similar heterogeneous ASIC strategies.

Scoring Rationale

This is a notable infrastructure story: adding Marvell as a design partner signals pragmatic supply-chain diversification and a microarchitectural shift toward MPU+TPU heterogeneity, which matters to hardware and infrastructure planners. It is important but not paradigm-shifting, so it scores in the notable range.

MoreGoogle news

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active Search Campaigns by BudgetEasy

High CPC Clicks & Poor Landing PagesMedium

Campaign ROAS by Attribution ModelHard

250 free problems · No credit card

See all Ad Tech problems