Products & Toolsanthropicclaude codemodel regressiondeveloper tools

AMD AI Director Criticizes Claude Code Performance

|April 12, 2026|By LDS Team

6.9

Relevance Score

AMD AI Director Criticizes Claude Code Performance — Photo: akm-img-a-in.tosshub.com · rights & takedowns

AMD's senior director of AI, Stella Laurenzo, publicly reported measurable regression in Anthropic's Claude Code, concluding the tool has become "lazy" and fails at complex engineering tasks. Her team analyzed 6,852 sessions with 234,760 tool calls and 17,871 thinking blocks, finding a sharp rise in behaviors they label as laziness: premature stopping, ownership-dodging, and permission-seeking. The timing aligns with an early-March update that enabled thinking-content redaction in Claude Code (reported versions 2.1.69 and earlier 2.1.20), which hides the model's internal reflections. Metrics include a drop in reads per file from 6.6 to 2, and stop-hook violations rising from zero to about 10 per day. Engineers report abandoning Claude Code for high-complexity debugging and kernel-level work. The incident raises practical questions about default response redaction, regression testing for developer-facing models, and observability of chain-of-thought behavior in production agents.

What happened

AMD AI Group director Stella Laurenzo published a detailed GitHub report and issue showing measurable regression in Anthropic's coding assistant, Claude Code. The analysis covered 6,852 sessions, 234,760 tool calls, and 17,871 thinking blocks, and concludes that Claude Code now exhibits more frequent "stop-hook" violations, fewer code reads before edits, and increased permission-seeking. Those patterns coincided with an early-March deploy that introduced thinking-content redaction in versions 2.1.69 and earlier 2.1.20.

Technical details

Laurenzo's team logged these concrete changes:

•A drop in average file reads from 6.6 to 2 before edits, indicating less contextual inspection.
•Stop-hook violations rising from 0 to roughly 10 per day, signaling premature termination of the model's internal reasoning or ownership-dodging.
•Increased frequency of whole-file rewrites instead of targeted edits, and more instances where Claude reports completion without actually resolving the task.

The team used session logs and tool-call traces to identify these patterns across pre- and post-update windows. The report references use of the Opus family model in Claude Code and reproducible failure modes with identical prompts.

Why it matters

This is a practitioner-facing failure mode in a developer tool used for complex engineering tasks, including kernel- and hardware-level debugging. Hiding thinking content by default breaks observability for downstream users, making it harder to detect shallow reasoning or partial work. Default redaction trades transparency for a cleaner UX, but the tradeoff here appears to reduce the model's effective deliberation and increases unsafe shortcuts when tackling high-complexity problems.

Broader context and comparison

Industry teams have long relied on agent traces or chain-of-thought proxies to triage and trust model outputs. The complaint echoes earlier user reports after version 2.1.20 that explanations became truncated. Similar regressions in developer assistants typically stem from one of three changes: model parameter updates that reduce deliberation, safety filters or redaction layers that truncate intermediate state, or degraded prompt-context handling that shortens read windows. Other coding agents have faced comparable tradeoffs between verbosity, token cost, and reliability; here the operational signal suggests Anthropic's update materially affected capability at scale.

Operational implications for teams

Engineers should treat developer-facing LLM tools as evolving services that require continuous regression tests targeted at real workflows. If you deploy or depend on Claude Code, add deterministic unit-style prompts exercising multi-file edits, incremental reads, and ownership semantics. Monitor observable signals similar to Laurenzo's: reads-per-file, stop-hook-like terminations, and unexpected whole-file rewrites. Consider pinning model versions or disabling thinking redaction where auditability is required.

What to watch

Anthropic's response and telemetry changes next are critical. If the regression is due to deliberate redaction policies, expect a tradeoff discussion between safety, token costs, and developer trust. If the cause is a deployment regression, practitioners should watch for a rollback or a follow-up patch restoring deeper reasoning or optional visibility into thinking content.

Bottom line

This is not a cosmetic UX complaint, it is a measurable regression in developer trust and reliability for complex engineering workloads. The incident highlights that default transparency settings and release regression testing need to be part of the SLA calculus for any coding assistant used in production engineering.

Key Points

1AMD's analysis found measurable regression across 6,852 sessions, linking degraded behavior to an early-March update; this signals a production regression, not isolated anecdotes.
2Thinking-content redaction reduced observability and appears correlated with the model taking cheaper, shallower actions, undermining trust for complex engineering tasks.
3Practitioners must add targeted regression tests and observability (reads-per-file, premature stops, whole-file rewrites) when adopting coding agents in high-complexity workflows.

Scoring Rationale

The report documents measurable regression in a widely used developer tool, affecting engineering workflows and trust. It is notable for practitioners who rely on coding agents, but not a systemic industry-shifting event.

MoreAnthropic news

Sources

Public references used for this report.

7 sources

01timesofindia.indiatimes.comAMD's AI director is not happy with Anthropic's Claude code

02infoworld.comEnterprise developers question Claude Code's reliability ... - InfoWorld

03theregister.comClaude Code has become dumber, lazier: AMD director - The Register

View 4 more sources

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems