Anthropic Confirms and Fixes Claude Performance Issues

Anthropic wrote in a detailed postmortem that three product-layer changes and a caching bug caused a broad decline in Claude Code quality that many developers reported in April 2026 (Anthropic postmortem; The Decoder). The company says all three issues were fixed in v2.1.116 as of April 20, 2026 (The Decoder; VentureBeat). Third-party benchmarks and developer analyses reported measurable accuracy and reasoning drops, with BridgeMind results cited by VentureBeat showing a fall from 83.3% to 68.3% for an Opus variant. Reporting from Fortune and others also highlights user backlash, questions about compute constraints, and wider trust erosion. A longer satirical timeline assembled on clawd.rip documents earlier legal and policy controversies involving Claude and Anthropic.
What happened
Anthropic published a technical postmortem and related communications attributing an observed decline in Claude Code quality to three separate changes and a caching bug, and reported that fixes shipped in v2.1.116 on April 20, 2026 (Anthropic postmortem; The Decoder; VentureBeat). The issues described in reporting include a lowered default reasoning effort introduced on March 4, 2026; a caching optimization deployed March 26, 2026 that inadvertently deleted reasoning history on each turn; and tighter text-length or verbosity restrictions that reduced sustained reasoning, according to The Decoder and VentureBeat. Third-party tests and community benchmarks documented noticeable performance drops; VentureBeat cites BridgeMind data showing accuracy for an Opus variant falling from 83.3% to 68.3%.
Technical details
Per Anthropic's postmortem and contemporaneous reporting, the problems combined at the product layer rather than the underlying inference API. The Decoder and VentureBeat report that the March 4 change lowered the default reasoning effort from "high" to "medium" to reduce latency for some users, internal caching code introduced on March 26 caused session reasoning history to be wiped, and a verbosity/prompt restriction further shortened context retention. Anthropic's engineering postmortem traces how these overlapping behaviours created forgetfulness, repetition, and reduced reasoning depth in multi-turn developer workflows (Anthropic postmortem; The Decoder; VentureBeat).
Context and significance
Editorial analysis: Companies operating large-scale model services commonly face trade-offs between latency, cost, and reasoning depth. Product-layer heuristics or cache optimizations intended to lower latency can materially change model outputs when they alter context retention or prompt scaffolding. The outcome reported here underscores two recurring industry themes: the difficulty of validating cross-platform infra changes at scale, and how small defaults or caching bugs can cascade into broad user-facing quality regressions.
Wider reaction and trust
Reporting in Fortune, VentureBeat, and other outlets documents developer frustration and reputational damage following a slow public response. Fortune republishes a company statement noting elevated demand and compute pressure, and coverage frames the episode alongside prior legal and policy controversies catalogued in a satirical timeline on clawd.rip (Fortune; clawd.rip). Third-party benchmark drops amplified community scrutiny and fed narratives about transparency and resource limits.
What to watch
For practitioners: monitor reproducible benchmarks and changelogs for Claude Code and related Opus releases; watch for follow-up telemetry or independent audits that confirm restore of prior reasoning depth. Observers should also track published validation practices and test coverage that vendors provide for cross-hardware equivalence, since Anthropic cited multi-platform complexity in its postmortem (Anthropic postmortem). Finally, watch developer forums and third-party leaderboard results for ongoing signal about regression risk and user-facing latency-quality trade-offs.
Scoring Rationale
A major provider admitted engineering changes and bugs caused observable quality regressions that affected developers and benchmarks. This matters to practitioners tracking model reliability, deployment risk, and vendor transparency.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


