Infrastructurenvidiablackwelltokenomicsinference costs

NVIDIA Blackwell Pressure Reduces AI Token Costs

|June 12, 2026|By LDS Team

7.8

Relevance Score

NVIDIA Blackwell Pressure Reduces AI Token Costs — Photo: i.insider.com · rights & takedowns

Business Insider reports that token prices used to measure AI usage could fall sharply as new infrastructure and models reduce cost per token. Business Insider cites an unnamed AI infrastructure CEO saying a "crop of new AI models later this year" will be more efficient and abundant, and highlights a Silicon Data token-spending index that peaked near 2.06 in late May and fell to 1.75 by June 10, a change Silicon Data CEO Carmen Li told Business Insider could reflect falling token prices. NVIDIA's corporate blog states that providers including Baseten, DeepInfra, Fireworks AI, and Together AI are running optimized inference stacks on NVIDIA Blackwell, and claims these stacks can cut cost per token by up to 10x versus the Hopper platform. Business Insider frames these trends as the key force that could drive token prices down.

What happened

Business Insider reports that AI token prices are likely to drop materially as newer models and infrastructure expand token supply and lower inference costs. Business Insider quotes an unnamed CEO of an AI infrastructure company who predicted "a crop of new AI models later this year that will be a lot better and more efficient," and cites a Silicon Data token-spending index that fell from approximately 2.06 in late May to 1.75 on June 10, a trend Carmen Li, CEO of Silicon Data, told Business Insider could indicate falling token prices. NVIDIA's official blog states that inference providers Baseten, DeepInfra, Fireworks AI, and Together AI have achieved up to 10x lower cost per token on the NVIDIA Blackwell platform compared with the Hopper generation.

Technical details

NVIDIA's blog frames the cost reduction as a combination of improved hardware-software co-design in the Blackwell platform plus optimized inference stacks run by third-party providers. The blog claims infrastructure and algorithmic efficiencies are reducing inference costs for frontier-level performance by up to 10x annually and presents customer examples, including a healthcare platform that reportedly cut inference costs by 10x using Baseten plus Blackwell.

Industry context

Implications for practitioners

What to watch

Editorial analysis

Companies and analysts have repeatedly observed that step-change improvements in inference efficiency and hardware utilization tend to compress the marginal cost of model runtime. Historical patterns show that when new accelerator generations and optimized stacks coincide with more efficient model architectures or quantization techniques, per-interaction costs can fall rapidly, shifting price pressure onto token-based billing models.

For ML engineers and SRE teams, cheaper tokens change tradeoffs for model selection, batching, latency-vs-cost tuning, and where to host inference. Lower per-token cost favors higher-throughput services, larger context windows, and more aggressive sampling in production, while also increasing incentives to re-evaluate cost-accounting and observability for token consumption.

Track independent benchmarks and third-party telemetry (for example, reproducible latency and throughput tests on Blackwell vs prior platforms), vendor pricing announcements that translate lower infrastructure costs into customer rates, and token-spend indexes such as the Silicon Data index for continued directional signals. Also watch open-source model releases and quantization/compilation tools that materially change inference FLOPs per token.

Limitations

Business Insider's piece includes an unnamed CEO comment and cites internal index movement; NVIDIA's blog is a vendor source highlighting partners and claimed gains. Neither source alone proves a market-wide, sustained token-price collapse; independent telemetry and provider pricing will be necessary to verify the scale and persistence of any decline.

Key Points

1NVIDIA reports partners achieving up to 10x lower cost per token on Blackwell, a supply-side shock to inference economics.
2Independent token-spend telemetry, such as the Silicon Data index, shows recent downward movement that may reflect cheaper inference.
3Industry trends show step-change hardware and stack efficiency often compress per-interaction costs, shifting operational tradeoffs for ML teams.

Scoring Rationale

This is a major infrastructure development with direct operational impact: NVIDIA and partner claims of up to 10x token-cost reductions materially affect inference economics and deployment choices for ML teams. Independent verification and pricing pass-through will determine real-world impact.

MoreNVIDIA news

Sources

Public references used for this report.

1 source

blogs.nvidia.comLeading Inference Providers Achieve Lowest Token Cost With Open ...

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems