Lean Runtime Reveals Buffer Overflow Despite Verification
A verifier-authored, AI-assisted project produced a fully verified lean-zip implementation of zlib whose core correctness theorem guarantees decompression is the exact inverse of compression for inputs under 1 GB. After directing a Claude agent and standard memory-checking tooling at the code, a weekend of automated testing produced 105 million fuzzing executions. The verified application code showed zero memory issues, but the experiment uncovered a heap buffer overflow in the Lean 4 runtime, specifically in lean_alloc_sarray. The result highlights a critical trust boundary: formal proofs about program logic do not automatically cover the runtime, compiler, or toolchain that execute the code.
What happened
A team produced lean-zip, a Lean-verified implementation of zlib, with the central theorem zlib_decompressSingle_compress proving that decompression inverts compression for any byte array under 1 GB. Over a weekend the author used a Claude agent plus fuzzing and memory-checking tools and ran 105 million fuzzing executions. The verified application showed zero memory vulnerabilities, but the run exposed a heap buffer overflow in the Lean 4 runtime function lean_alloc_sarray.
Technical details
The core theorem in lean-zip is stated as zlib_decompressSingle_compress (data : ByteArray) (level : UInt8) (hsize : data.size < 1024 * 1024 * 1024) : ZlibDecode.decompressSingle (ZlibEncode.compress data level) = .ok data, asserting end-to-end correctness. The verification covers Lean-level logic and data structures, not the native runtime implementation. The fuzzing toolchain included:
- •AFL++
- •AddressSanitizer
- •Valgrind
- •UndefinedBehaviorSanitizer (UBSan)
These found no issues in the Lean-written code but did trigger a heap overflow in the runtime allocation path, implicating native code that implements Lean's memory layout and allocation routines.
Context and significance
This incident exposes a recurring gap in formal-methods deployments: proofs guarantee properties relative to a formal semantics and assumptions about the runtime, but those assumptions can be invalidated by bugs in the runtime, compiler, or external libraries. The result shows that AI-assisted vulnerability discovery is accelerating the surface of discovery, making previously obscure runtime bugs far easier to expose. The practical takeaway for verification projects is that the trust boundary must extend beyond high-level proofs to include the toolchain and native runtime, or the runtime must itself be formally verified or replaced with a memory-safe implementation.
What to watch
Reproduce and patch the lean_alloc_sarray overflow in the Lean 4 runtime, rerun the proof/CI pipeline, and consider integrating continuous fuzzing plus sanitizers into verification CI. Longer term, expect more fuzzing and AI-assisted audits to push teams toward verifying runtimes or isolating them behind memory-safe layers.
Scoring Rationale
The discovery is significant for practitioners who rely on formal verification: it demonstrates a realistic attack vector that undermines proofs. The scope is limited to the Lean runtime rather than a mass-market compiler, so it is notable rather than industry-shaking. Freshness reduces the score slightly.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

