Anthropic clarifies Claude quota drain causes

Anthropic says recent user reports of accelerated quota depletion for Claude Code are not caused by its cache TTL tweaks. The company moved the prompt cache TTL from one hour back to five minutes for many requests earlier this year. Cache behavior affects token accounting: writing to the five-minute cache costs 25% more in tokens, writing to the one-hour cache costs 100% more, and reading from cache costs about 10% of base price. Developers report long, high-context sessions now burn quotas faster, especially when using the 1M-token context window on Claude Opus 4.6 and Sonnet 4.6. Anthropic maintains the TTL switch should not increase costs systemically, while third-party analysts identify session patterns and large-context cache misses as the real drivers of higher burn rates.
What happened
Anthropic confirmed the faster quota burn reported by some developers is not directly caused by its recent prompt cache default changes for Claude Code. The company had previously introduced a one-hour cache around February 1, then reverted many requests to a five-minute cache around March 7. Users began seeing accelerated quota depletion in March, particularly on workflows that hold large context between interactions using the 1M-token context window on Claude Opus 4.6 and Sonnet 4.6.
Technical details
Anthropic and community analysis reveal how prompt caching interacts with token accounting. Key points:
- •Writing to the five-minute cache costs 25% more in tokens, while writing to the one-hour cache costs 100% more.
- •Reading from cache is roughly 10% of the base price, making cache hits far cheaper than recomputing context.
- •Large 1M-token context windows make cache misses expensive because reprocessing high-context requests multiplies compute and token costs.
Context and significance
The change in default TTLs is not a simple billing bug; it surfaces a design tradeoff between cache longevity and per-request write cost. Developers who run long interactive sessions or agentized subagent patterns are sensitive to TTL expiry because they rely on repeated cache reads across a session. Jarred Sumner endorsed the community detective work but argued the five-minute TTL can be cheaper for many one-shot flows because fewer long-lived cache entries are written. Sean Swanson and others reported they only began hitting quota limits after the TTL reversion and expanded use of the 1M-token window. Claude Code creator Boris Cherny warned that cache misses at that scale are particularly costly when sessions lapse beyond cache TTL.
What to watch
Monitor Anthropic for tooling or controls that expose cache TTL choices per client, clearer billing telemetry linking cache hits/misses to token usage, and guidance for optimizing session patterns and context management to avoid unexpected burn rates.
Scoring Rationale
This affects practitioners using Anthropic's developer tooling and large-context models, with tangible cost and workflow implications. It is a notable product-level issue rather than a sector-shifting event.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.


