Anthropic and OpenAI Face AI Token Pricing Crisis

According to CNBC, the core metric for AI consumption -- tokens -- is producing distorted demand signals as firms and enterprises optimize for volume rather than outcome. CNBC reports Anthropic has moved away from flat-rate enterprise pricing toward per-token billing, with its published rates at $5 per million input tokens and $25 per million output tokens. The article quotes enterprise and vendor voices, including NVIDIA CEO Jensen Huang and Databricks CEO Ali Ghodsi, describing internal leaderboards and incentives that reward token burn. CNBC frames Anthropic's per-token pricing as better aligned with actual usage, while implying flat-rate approaches may inflate apparent demand and justify large infrastructure spend.
What happened
According to CNBC, tokens are the basic unit of AI usage and token consumption has become a distorted metric for demand. CNBC reports Anthropic has moved away from flat-rate enterprise pricing and toward per-token billing, with published rates of $5 per million input tokens and $25 per million output tokens. CNBC also reports that some companies track employee token usage on internal leaderboards; the article quotes NVIDIA CEO Jensen Huang saying he would be "deeply alarmed" if an engineer earning $500,000 was not using at least $250,000 worth of compute. CNBC quotes Ali Ghodsi, CEO of Databricks, warning that teams can easily inflate token consumption without producing value.
Editorial analysis - technical context
Industry-pattern observations: Token-based metering is the dominant billing primitive across cloud and API-driven LLM services because it maps closely to compute and memory costs at scale. Companies that publish explicit per-token rates make cost-per-inference transparent, which helps customers do unit economics. Conversely, flat-rate or seat-based enterprise contracts can mask marginal costs and create incentives to maximize volume. For data teams and ML engineers, this affects experiment design, sampling strategies, prompt engineering, and the tradeoff between latency, context length, and model size.
Context and significance
Public reporting places the pricing debate against a backdrop of large infrastructure investments justified by rising token consumption. If token metrics are inflated by internal incentives or poor instrumentation, reported usage growth can overstate sustainable revenue and compute demand. For procurement and finance teams, per-token pricing reduces ambiguity in chargeback models; for vendors, pricing choices influence how customers discover and internalize marginal costs.
What to watch
For practitioners: monitor vendor pricing pages and published per-token rates when modeling production costs. Watch for more vendors publishing differentiated input vs output pricing, tiered rates by context length, and clearer documentation of how agentic or retrieval-heavy workflows affect token burn. Observers should also track whether enterprise contracts shift toward hybrid pricing that mixes seats, committed spend, and metered usage, and whether benchmarking efforts begin to standardize token accounting for multi-step, agentic workloads.
Scoring Rationale
Pricing structure affects cost modeling, vendor selection, and procurement for ML teams, making this practically relevant. The story is not a frontier technical breakthrough, and the original reporting is several weeks old, lowering immediacy.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

