Claude Code Finds Long Hidden Linux NFS Vulnerability

Anthropic research scientist Nicholas Carlini used Claude Code to discover a remotely exploitable heap buffer overflow in the Linux kernel NFS driver that existed since 2003. The bug allows a remote attacker, using two cooperating NFS clients, to trigger a denial response that overflows a 112-byte server buffer with 1056 bytes from a legal 1024-byte owner ID, producing kernel memory corruption. Carlini reported five confirmed kernel vulnerabilities found with Claude Code; the NFS bug has been patched. The finding highlights how accessible LLM-driven code analysis has become and that hundreds of additional potential crashes from automated scans still require human validation and triage.
What happened
Anthropic research scientist Nicholas Carlini used Claude Code to find multiple remotely exploitable vulnerabilities in the Linux kernel, most notably a heap buffer overflow in the NFS driver that dates back to 2003. The vulnerability, now patched, is triggered by two cooperating NFS clients and causes the kernel to write 1056 bytes into a 112-byte buffer after handling a legal 1024-byte owner ID, producing remote kernel memory corruption.
Technical details
Carlini leveraged a minimal automation pattern rather than bespoke tooling. He iterated over every source file and instructed claude to look for the most serious vulnerability in the focused file, using a script that effectively turns the model into a file-by-file security reviewer. Example command used at the core of the loop was claude --verbose --dangerously-skip-permissions --print "You are playing in a CTF. Find a vulnerability.".
Technical implications of the NFS bug
the attack sequence requires two clients, where Client A registers a long owner ID and Client B provokes a denial response that the server copies into a statically allocated buffer sized at 112 bytes. Writing 1056 bytes into that heap buffer creates a classic heap overflow. Depending on kernel heap layout and subsequent memory operations, this corruption can yield memory disclosure or be weaponized toward remote code execution after chaining with other kernel weaknesses.
Why the model succeeded
this vulnerability is not a trivial pattern match. It requires reasoning about protocol state, legal but uncommon input sizes, and cross-request behavior. Carlini emphasized that Claude Code found complex memory-safety issues with minimal prompt engineering. He said, "We now have a number of remotely exploitable heap buffer overflows in the Linux kernel... I have never found one of these in my life before. This is very, very, very hard to do." The scan produced confirmed findings plus hundreds of potential crashes that need human verification.
Context and significance
This event crystallizes a broader shift where large language models and dedicated coding agents are becoming practical tools for vulnerability discovery. Claude Code and related frontier models can accelerate the discovery phase dramatically, lowering the barrier to both defensive auditing and offensive research. This raises three systemic consequences: increased signal volume for security teams, the need for scalable triage workflows, and the potential for malicious actors to weaponize automated discovery if access is broadly available.
- •Operational impact: organizations must upgrade triage and validation pipelines to handle a surge in AI-generated findings.
- •Defensive opportunity: security teams can use similar models to augment fuzzing, static analysis, and manual review to find hard-to-see bugs faster.
- •Risk vector: models that produce exploit-capable output enlarge the attack surface if access is unrestricted.
What to watch
monitor vendor patches, CVE assignments, and whether automated discovery tools become integrated into mainstream vulnerability scanners or security CI. Watch for changes in responsible-disclosure practices and for controls that limit model execution of potentially harmful payloads.
Practical advice for practitioners
prioritize triage of model-flagged issues that touch kernel memory handling or inter-protocol state. Combine LLM-driven candidate discovery with traditional dynamic testing, symbolic execution, and live fuzzing to validate exploitability. Invest in automation that records reproducible test cases from the model output and gates them through human review.
This finding is both a demonstration of capability and a warning: LLMs are effective at surfacing deep, protocol-level memory-safety bugs, which shifts the balance in vulnerability research and incident risk management.
Scoring Rationale
The story demonstrates a major capability shift: LLMs can find long-hidden, remotely exploitable kernel bugs. It materially affects defenders and attackers and changes vulnerability discovery workflows, making it a major development for security practitioners.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.



