Products & Toolsanthropicclaude codeproduct reliabilitymodel degradation

Anthropic Fixes Claude Code Performance Regression

|April 24, 2026|By LDS Team

7.1

Relevance Score

Anthropic Fixes Claude Code Performance Regression — Photo: akm-img-a-in.tosshub.com · rights & takedowns

Anthropic identified and fixed three product-layer changes that degraded Claude Code performance for some users. The company traced the regressions to a default reasoning-effort downgrade, a session-caching bug that cleared prior "thinking" every turn, and a system-prompt change that reduced verbosity and harmed code quality. Anthropic says the underlying API and inference layer were not affected and rolled back the changes in a patch (v2.1.116) on April 20, 2026, while resetting usage limits. The episode highlights how UI, agent harnesses, and prompt engineering can cause outsized user-facing regressions even when core model parameters remain unchanged.

What happened

Anthropic investigated complaints that Claude Code had become noticeably worse and found three separate product-layer changes that together produced a perceptible quality drop. The company confirmed the fixes were deployed in patch v2.1.116 on April 20, 2026 and said the API and inference layer were not impacted. "We never intentionally degrade our models," Anthropic wrote in its post-mortem.

Technical details

The three root causes were changes to default settings, a session caching bug, and an added system prompt. Each impacted different slices of traffic and model variants, producing inconsistent user experience.

•On March 4 Anthropic changed Claude Code default reasoning effort from high to medium to reduce latency spikes in high mode; this tradeoff reduced reasoning depth for Sonnet 4.6 and Opus 4.6 and was reverted on April 7.
•On March 26 a background change intended to clear older "thinking" for idle sessions introduced a bug that cleared that state every turn for the whole session, making the agent forget prior context and repeat itself; fixed on April 10.
•On April 16 Anthropic added a system prompt instruction to reduce verbosity; combined with other prompt edits this materially harmed coding quality and was reverted on April 20. This affected Sonnet 4.6, Opus 4.6, and Opus 4.7.

Why these changes matter for practitioners

These are product-harness issues rather than model-weight changes. That means a stable inference stack can still produce degraded outputs when upstream wrappers, default parameters, session management, or system prompts are altered. Third-party testers and developers reported measurable drops; VentureBeat cited a BridgeMind benchmark slide showing Opus 4.6 accuracy falling from 83.3% to 68.3%, illustrating how surface regressions can be large and sudden when multiple product adjustments interact.

Operational and engineering lessons

The incident exposes several brittle areas in model deployment pipelines. First, default parameter changes (for example reasoning effort) are a high-leverage knob and should be canaried separately across traffic and evals. Second, session hygiene and caching require stricter end-to-end tests; a change meant to improve latency created a persistent correctness bug. Third, system prompts and verbosity constraints need targeted functional tests that include domain-specific evals such as code synthesis and long-form reasoning. Anthropic reset usage limits and says it will change processes to reduce the chance of similar regressions.

Context and significance

The episode reinforces a recurring theme in production LLM systems: the distinction between model-core stability and product-layer fragility. Competitors and customers watch such incidents closely because they affect trust, SLAs, and benchmark claims. For teams building agents, the takeaway is to instrument and benchmark not only the model API but the full harness, including prompt mutations, default-effort semantics, session state management, and UI-level defaults.

What to watch

Expect Anthropic to publish follow-up process changes and improved canarying and telemetry. Independent benchmarks and developer reports will be the next signal to confirm restoration of prior quality. Also watch whether other vendors adopt stricter guardrails for default-effort knobs and system-prompt rollouts, since these are common vectors for regressions.

Key Points

1Three product-layer changes, not model weights, caused Claude Code regressions; fixes deployed in patch v2.1.116 restored quality.
2Default reasoning-effort, session-state caching, and a verbosity system prompt interacted to produce large, inconsistent user-facing degradation.
3Incident underscores the need to canary default parameters, end-to-end harness tests, and domain-specific evals for deployed LLM agents.

Scoring Rationale

This is a notable operational incident: it did not change model weights but exposed high-risk product-layer failure modes that affect developer trust and enterprise deployments. It does not rise to industry-shaking model research news, but the practical implications for productionizing LLMs are significant.

MoreAnthropic news

Sources

Public references used for this report.

5 sources

01anthropic.comAn update on recent Claude Code quality reports

02businessinsider.comAnthropic says Claude Code did get worse — but shoots down speculation it 'nerfed' the model

03venturebeat.comMystery solved: Anthropic reveals changes to Claude's harnesses and operating instructions likely caused degradation

View 2 more sources

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active Search Campaigns by BudgetEasy

High CPC Clicks & Poor Landing PagesMedium

Campaign ROAS by Attribution ModelHard

250 free problems · No credit card

See all Ad Tech problems