NVIDIA Blackwell Pressure Reduces AI Token Costs
Business Insider reports that token prices used to measure AI usage could fall sharply as new infrastructure and models reduce cost per token. Business Insider cites an unnamed AI infrastructure CEO saying a "crop of new AI models later this year" will be more efficient and abundant, and highlights a Silicon Data token-spending index that peaked near 2.06 in late May and fell to 1.75 by June 10, a change Silicon Data CEO Carmen Li told Business Insider could reflect falling token prices. NVIDIA's corporate blog states that providers including Baseten, DeepInfra, Fireworks AI, and Together AI are running optimized inference stacks on NVIDIA Blackwell, and claims these stacks can cut cost per token by up to 10x versus the Hopper platform. Business Insider frames these trends as the key force that could drive token prices down.
What happened
Business Insider reports that AI token prices are likely to drop materially as newer models and infrastructure expand token supply and lower inference costs. Business Insider quotes an unnamed CEO of an AI infrastructure company who predicted "a crop of new AI models later this year that will be a lot better and more efficient," and cites a Silicon Data token-spending index that fell from approximately 2.06 in late May to 1.75 on June 10, a trend Carmen Li, CEO of Silicon Data, told Business Insider could indicate falling token prices. NVIDIA's official blog states that inference providers Baseten, DeepInfra, Fireworks AI, and Together AI have achieved up to 10x lower cost per token on the NVIDIA Blackwell platform compared with the Hopper generation.
Technical details
NVIDIA's blog frames the cost reduction as a combination of improved hardware-software co-design in the Blackwell platform plus optimized inference stacks run by third-party providers. The blog claims infrastructure and algorithmic efficiencies are reducing inference costs for frontier-level performance by up to 10x annually and presents customer examples, including a healthcare platform that reportedly cut inference costs by 10x using Baseten plus Blackwell.
Industry context
Editorial analysis: Companies and analysts have repeatedly observed that step-change improvements in inference efficiency and hardware utilization tend to compress the marginal cost of model runtime. Historical patterns show that when new accelerator generations and optimized stacks coincide with more efficient model architectures or quantization techniques, per-interaction costs can fall rapidly, shifting price pressure onto token-based billing models.
Implications for practitioners
Editorial analysis: For ML engineers and SRE teams, cheaper tokens change tradeoffs for model selection, batching, latency-vs-cost tuning, and where to host inference. Lower per-token cost favors higher-throughput services, larger context windows, and more aggressive sampling in production, while also increasing incentives to re-evaluate cost-accounting and observability for token consumption.
What to watch
Editorial analysis: Track independent benchmarks and third-party telemetry (for example, reproducible latency and throughput tests on Blackwell vs prior platforms), vendor pricing announcements that translate lower infrastructure costs into customer rates, and token-spend indexes such as the Silicon Data index for continued directional signals. Also watch open-source model releases and quantization/compilation tools that materially change inference FLOPs per token.
Limitations
Business Insider's piece includes an unnamed CEO comment and cites internal index movement; NVIDIA's blog is a vendor source highlighting partners and claimed gains. Neither source alone proves a market-wide, sustained token-price collapse; independent telemetry and provider pricing will be necessary to verify the scale and persistence of any decline.
Scoring Rationale
This is a major infrastructure development with direct operational impact: NVIDIA and partner claims of up to 10x token-cost reductions materially affect inference economics and deployment choices for ML teams. Independent verification and pricing pass-through will determine real-world impact.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

