Google Splits TPU 8 Into Training and Inference Chips

Google is splitting its next-generation TPU 8 family into two purpose-built chips: a training-focused accelerator (TPU 8t) and a cost-optimized inference accelerator (TPU 8i). Industry reporting indicates the training device is being co-designed with Broadcom under an extended multi-generation engagement, while the inference device is being co-designed with MediaTek (Google has not publicly confirmed the design-partner split). The family will integrate with Google's Axion CPU line and lean on advanced packaging such as CoWoS and HBM stacks, putting pressure on foundry and packaging capacity. The split reflects a broader industry trend toward specialization of compute for training versus inference and will reshape supplier wins, CoWoS demand, and data-center component ecosystems.
Editor's note — Updated May 12, 2026
This article has been revised following a correction request from Google Cloud's communications team, relayed by their agency Mission North. Earlier versions referred to the chips using pre-launch industry codenames — TPUv8t / "Sunfish" and TPUv8i / "Zebrafish" — reported by third-party analysts ahead of Google's official announcement. Google has since formally introduced its eighth-generation TPUs as TPU 8t (training) and TPU 8i (inference) at Google Cloud Next 2026. All chip references in this piece now use the official product names. See Google's announcement: Our eighth generation TPUs: two chips for the agentic era.
What happened
Google is splitting the eighth-generation TPU 8 family into two distinct chips, a high-performance training accelerator and a cost-optimized inference accelerator. The training chip, TPU 8t, is reportedly being co-designed with Broadcom. The inference chip, TPU 8i, is reportedly being co-designed with MediaTek. The Broadcom and MediaTek attributions come from industry analysts (notably SemiAnalysis) and have not been publicly confirmed by Google. Google will continue tight systems integration with its Axion CPU line based on Neoverse N3 cores. JPMorgan commentary tied to the deal signals multi-generation scope and substantial revenue upside for infrastructure vendors.
Technical details
The split separates design goals and supply-chain flows. TPU 8t (training) prioritizes raw matrix throughput, multi-socket coherency, and high-bandwidth memory capacity. TPU 8i (inference) optimizes area, power, and cost per inference with likely tighter quantization and latency-focused I/O. Both chips are expected to rely on advanced packaging and HBM stacks, pushing demand for CoWoS-style integration and wafer-level interposers. Key technical points practitioners should note:
- •Broadcom is expected to own custom SerDes, PCIe/NVLink-class interconnects, and the high-speed fabric for multi-die training nodes.
- •MediaTek is positioned to optimize die-area, power envelopes, and inference microarchitectures for cloud inference racks.
- •Integration with Axion suggests Google will keep CPU-memory and system orchestration tightly coupled to TPU scheduling and telemetry.
Context and significance
This is a strategic move on three fronts. First, it reflects an industry-wide acknowledgment that training and inference have diverged enough to justify specialized silicon, not a one-size-fits-all accelerator. Second, partnering with major contract design firms like Broadcom and MediaTek signals Google's pragmatic pivot from fully in-house ASIC design toward an LTA and partner-driven model for scale. Third, it intensifies competition for packaging capacity, especially CoWoS and HBM supply, which were already constrained by GPU and ASIC demand. JPMorgan-linked analysis referenced in market commentary projects that TPU-related hardware and networking revenue could become substantial in the back half of this decade, underscoring why foundries and OSATs are recalibrating capacity.
Why it matters for practitioners
If you run cloud infra, ML platforms, or hardware procurement, expect divergent node designs for training and inference. Training clusters will be denser in HBM capacity and interconnect complexity, while inference clusters will prioritize cost-efficiency and power. Software teams must plan for different compilation targets, quantization paths, and scheduling policies between TPU 8t and TPU 8i nodes. Hardware-software co-design, telemetry, and runtime selection will become more critical to achieve utilization and cost targets.
What to watch
Supply-chain bottlenecks for CoWoS and HBM, the specific interconnect protocols Broadcom implements, and MediaTek's microarchitecture choices for inference. Also monitor contract terms and whether the Broadcom relationship expands into networking components for Google's data centers. These elements will determine deployment cadence and overall TCO for TPU 8 systems.
Scoring Rationale
This rearchitecture and the supplier allocations materially affect data-center hardware design, foundry and packaging demand, and competitive dynamics versus GPU vendors. It signals a significant industry shift but is not a paradigm-breaking research result.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problems

