Token Pricing Inflates AI Usage Costs and Incentives

Token-based billing, the de facto metric for LLM usage, systematically incentivizes wasteful consumption and cost inflation. Counting tokens in and out is simple to meter but misaligns billing with useful work, rewarding verbosity, repeated retries, and engineering hacks that increase token burn. The Register labels this trend token incremental burn syndrome, or TIBS. For practitioners, the immediate consequences are higher operational costs, unpredictable billing spikes, and misaligned product design decisions that favor token-heavy features. Vendors favor tokens because they are measurable and opaque to many customers. The practical remedy requires alternative billing metrics, better observability, and rethinking API ergonomics so cost aligns with value delivered.
What happened
The Register argues that AI billing anchored to tokens has baked inflation and perverse incentives into modern LLM platforms. The opinion coins token incremental burn syndrome (TIBS) to describe progressively rising token consumption caused by design choices, retries, and feature creep, and notes that token counting is easy to implement but poor at reflecting useful work.
Technical details
The piece highlights why tokens became the default metric: lexemes are straightforward to parse and count for both prompt and response. Platforms count tokens going in and tokens coming out, and often apply simple budget checks like ntokens_left to gate usage. That simplicity creates predictable billing mechanics but also incentives that drive inefficiency. Practitioners see several measurable artifacts:
- •inflated prompt engineering to coax longer outputs
- •repeated inference retries and higher tokens per interaction
- •vendor-side additions of "slop" or metadata that increase output tokens
Alternative billing approaches to consider
- •outcome or task-based pricing tied to successful completions or business KPIs
- •compute-time or GPU-second billing, aligning cost to raw compute consumed
- •session or conversation pricing, which encourages stateful, efficient interactions
- •feature-tier pricing that charges for capabilities rather than token volume
Context and significance
This is a product-design and platform-economics problem, not a research limitation. Paying per token is analogous to paying programmers per keystroke; it rewards verbosity and inefficiency rather than value. For ML engineers and platform owners, token-based pricing affects architecture choices: developers may add caching, batch requests, or local models to avoid expensive API calls, shifting complexity back onto teams. For vendors, tokens remain attractive because they are auditable and easy to meter, and because customers lack mature observability to map tokens to business outcomes.
What to watch
Expect incremental changes: better observability (token-to-KPI mapping), hybrid pricing experiments, and vendor features that hide token costs behind higher-level primitives. The more consequential shift will be when one major provider pilots outcome- or compute-based pricing at scale; that could reset industry norms and reduce TIBS.
Scoring Rationale
The story highlights a widespread, practical problem that affects engineering costs and product design across LLM deployments. It is notable for practitioners but not a frontier technical breakthrough, so it rates in the 'notable' band.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.


