DeepInfra Raises $107M to Scale Inference Infrastructure

DeepInfra announced a $107 million Series B round to scale its purpose-built AI inference cloud, according to a company press release reported by Yahoo Finance and CityBiz (May 4-6, 2026). The round was co-led by 500 Global and angel investor Georges Harik, with participation from A.Capital Ventures, Crescent Cove, Felicis, Nvidia, Peak6, Samsung Next, Supermicro, and Upper90 (CityBiz, VentureBurn). DeepInfra reports processing nearly five trillion tokens per week and says token volume has grown 25x since its Series A (Yahoo Finance, VentureBurn). The company owns and operates GPU clusters across eight U.S. data centers and plans further global expansion, and local reporting by the Silicon Valley Business Journal says the startup intends to expand its roughly 25-person engineering workforce after the round (VentureBurn, CityBiz, BizJournals).
What happened
DeepInfra announced a $107 million Series B funding round in early May 2026, per a company press release distributed on Yahoo Finance and reported by CityBiz and VentureBurn. The round was co-led by 500 Global and angel investor Georges Harik, with participation from A.Capital Ventures, Crescent Cove, Felicis, Nvidia, Peak6, Samsung Next, Supermicro, and Upper90 (CityBiz; VentureBurn). Per the company's public materials, DeepInfra processes nearly five trillion tokens per week and reports 25x token-processing growth since its Series A (Yahoo Finance; CityBiz; VentureBurn). Local coverage in the Silicon Valley Business Journal reports the startup plans to expand its roughly 25-person workforce following the financing (BizJournals).
Technical details
Per the company press materials, DeepInfra operates a purpose-built cloud platform for high-throughput AI inference and owns GPU clusters across eight U.S. data centers, with plans to add international sites as demand grows (VentureBurn; CityBiz). The company's public statements frame its stack as owning hardware through to APIs to provide predictable latency, lower cost, and stability compared with spot or rented capacity, and it cites support for open-source and agentic AI workloads (CityBiz; Yahoo Finance).
Editorial analysis
Companies building specialist inference infrastructure are responding to two observable pressures in the market: widespread adoption of high-throughput, agent-driven workloads, and the increasing parity and deployment of open-source models. Industry reporting places DeepInfra's raise in that context, with investors emphasizing inference as a defining layer of the AI stack (Yahoo Finance; 500 Global commentary cited in press materials). For practitioners, the sustained focus on token throughput, owned GPU capacity, and predictable latency reflects an operational tradeoff: control and performance versus the capital intensity of owning hardware.
Context and significance
Editorial analysis: A $107 million Series B for an inference-focused cloud provider is a material signal that some investors see production-scale inference as a distinct infrastructure market, complementary to GPU spot/compute marketplaces and hyperscaler offerings. The participation of Nvidia and systems-focused investors underscores the hardware-plus-software nature of the problem. For ML engineers and platform teams, the story highlights vendor maturation around latency, cost-per-token economics, and integration with open-source model ecosystems.
What to watch
- •Follow-up reporting on hiring and headcount changes to verify the scale-up in engineering capacity (BizJournals reported hiring plans).
- •Expansion of data-center footprint beyond the current eight U.S. sites and any announced partnerships with cloud or colo providers (VentureBurn; CityBiz).
- •Benchmarks or third-party latency/cost comparisons demonstrating claimed advantages versus hyperscalers and spot capacity.
All quoted or numeric claims above are taken from the company's press materials and contemporary reporting by Yahoo Finance, CityBiz, VentureBurn, and the Silicon Valley Business Journal.
Scoring Rationale
The round is a notable funding event for inference-focused infrastructure and highlights a market trend toward purpose-built GPU clouds. It is relevant to ML engineers and platform teams but is not a frontier-model or regulatory milestone. Older coverage (early May) reduces recency.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


