Ollama contains critical GGUF out-of-bounds read

Per CVE.org and a GitLab advisory, a heap out-of-bounds read in the GGUF model loader affects Ollama versions before 0.17.1 and is tracked as CVE-2026-7482. Cyera's research team reports the flaw can be triggered via the /api/create endpoint using a crafted GGUF file whose declared tensor offset and size exceed the file length; the vulnerable WriteTo() path in fs/ggml/gguf.go and server/quantization.go permits reading past the allocated heap buffer. The leaked memory may include environment variables, API keys, system prompts, and concurrent users' conversation data, and the resulting artifact can be exfiltrated using the /api/push endpoint, the advisories state. Cyera estimates roughly 300,000 publicly exposed Ollama deployments are reachable in practice. A patch is available in 0.17.1, according to the GitLab advisory and CVE record.
What happened
Per CVE.org and the GitLab advisory, Ollama before 0.17.1 contains a heap out-of-bounds read vulnerability in the GGUF model loader, tracked as CVE-2026-7482. The vulnerability arises when the /api/create endpoint accepts a crafted GGUF file whose declared tensor offset and size exceed the file's actual length; during quantization the WriteTo() call in fs/ggml/gguf.go and server/quantization.go reads past the allocated heap buffer, the advisories describe. The leaked heap contents may include environment variables, API keys, system prompts, and concurrent users' conversation data, and the artifact can be pushed out via the /api/push endpoint to an attacker-controlled registry, per the published reports. The CVE record lists a CVSS 3.1 score of 9.1 for the issue.
Technical details
Per the GitLab advisory and CVE entry, the root cause is a bounds-check bypass in the GGUF model loader during model creation and quantization; the problematic behavior is located in the WriteTo() implementation that processes declared tensor metadata. The upstream distribution documents that /api/create and /api/push have no authentication by default, and the advisories note many deployments use OLLAMA_HOST=0.0.0.0, increasing internet exposure despite the default binding to 127.0.0.1. Cyera's technical write-up outlines a three-step exploitation sequence: upload a crafted GGUF, invoke /api/create to trigger the OOB read, then use /api/push to exfiltrate the resulting model artifact containing stolen heap data.
Industry context
Editorial analysis: Self-hosted inference platforms that accept user-supplied model artifacts create a high-risk attack surface when model-loading code assumes well-formed binary metadata. Industry reporting on this incident highlights the intersection of binary model formats, unsafe parsing paths, and unauthenticated management endpoints as a recurring class of risk for on-prem and edge LLM deployments.
Impact and scale
Per Cyera and multiple vulnerability trackers, roughly 300,000 Ollama instances are believed to be reachable on the public internet, a figure cited in SecurityWeek and Cyera reporting. The combination of unauthenticated endpoints and widely used local-to-network host configurations underpins the broad practical exposure that researchers observed.
What to watch
Editorial analysis: Observers and practitioners should monitor vendor and distribution channels for backported fixes, scan for internet-exposed Ollama instances, and audit environments for leaked credentials or pushed artifacts. Public advisories list 0.17.1 as the patched release; researchers and scanners (for example, runZero and Snyk) have already added detection signatures for the affected code paths.
Takeaway for practitioners
Editorial analysis: This incident reiterates the operational need to treat model-loaders and artifact ingestion as high-risk parsers, to limit network exposure of admin APIs, and to apply rapid detection and patching workflows for self-hosted inference stacks. The vulnerability shows how binary model formats like GGUF can become an attack vector if parsing code lacks robust bounds enforcement.
Scoring Rationale
A high-severity (CVSS 9.1) unauthenticated memory-disclosure flaw in a widely used self-hosted LLM runtime poses major operational risk to practitioners running private inference. The large estimated internet exposure magnifies urgency for patching and detection.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

