SGLang Enables Remote Code Execution via Malicious GGUF Models

A critical vulnerability, CVE-2026-5760, in SGLang allows malicious GGUF model files to achieve remote code execution on inference servers via the /v1/rerank endpoint. The flaw stems from unsandboxed template rendering, where jinja2.Environment() is used instead of ImmutableSandboxedEnvironment when processing a model-supplied tokenizer.chat_template. A public proof-of-concept demonstrates a weaponized GGUF hosting an SSTI payload that triggers SGLang's Qwen3 reranker path and executes arbitrary Python commands, enabling full host takeover. The vulnerability carries CVSS 9.8 severity and highlights a model metadata supply-chain risk; immediate mitigations include not loading untrusted GGUF models, disabling reranker functionality, or applying the patch to enforce sandboxed rendering.
What happened
A critical remote code execution vulnerability, CVE-2026-5760, was discovered in SGLang, the inference framework. An attacker-controlled GGUF model containing a malicious tokenizer.chat_template can trigger server-side template injection (SSTI) in the /v1/rerank endpoint and execute arbitrary Python on the host. The flaw is rated CVSS 9.8 and has a public proof-of-concept on GitHub.
Technical details
The vulnerable code path lives in python/sglang/srt/entrypoints/openai/serving_rerank.py. When SGLang processes model-supplied chat templates it creates a Jinja2 environment using jinja2.Environment() rather than the safer ImmutableSandboxedEnvironment. That unsandboxed environment allows SSTI payloads embedded in tokenizer.chat_template to access Python globals and run system commands. The PoC, authored by "Stuub" on GitHub, demonstrates a template that uses lipsum.__globals__["os"].popen(...) and a trigger phrase that activates the Qwen3 reranker path.
Attack mechanics
Practitioners should understand the exploit chain, which is concise and low-effort for an attacker:
- •Host a weaponized GGUF model on a model repository such as HuggingFace.
- •Trick a target into loading the model into SGLang, or target a shared inference platform that auto-loads community models.
- •Send a request that exercises /v1/rerank; SGLang renders the model-supplied tokenizer.chat_template using jinja2.Environment().
- •The SSTI payload escapes the template context and executes arbitrary Python, leading to full host compromise.
Context and significance
This is a textbook model-supply-chain vulnerability: untrusted model metadata is treated as code. The classes involved map to CWE-1336 and CWE-94, and the issue resembles prior SSTI-based RCE incidents in model runtimes, such as the "Llama Drama" family of vulnerabilities. The practical impact is high because many teams and vendors run inference stacks that load community models, and a single malicious GGUF can pivot to the underlying OS, steal keys, or move laterally in internal networks.
Mitigations and immediate actions
Developers and operators should assume hostile model metadata and apply defense-in-depth. Recommended steps are:
- •Do not load untrusted GGUF models into production SGLang instances.
- •Temporarily disable or firewall the /v1/rerank endpoint until a patch is applied.
- •Patch the rendering path to use ImmutableSandboxedEnvironment or otherwise enforce a strict sandbox when rendering templates.
- •Validate and sanitize tokenizer.chat_template fields at model import time, and implement metadata whitelisting.
- •Run inference workloads in isolated containers or VMs with least privilege and no host filesystem access.
What to watch
Watch for an official SGLang security patch, vendor advisories such as VU#915947, and model-repository takedowns for weaponized GGUFs. Longer term, platform owners must add metadata validation and runtime isolation for community models to reduce supply-chain attack surface.
Scoring Rationale
This is a high-severity RCE in a widely used inference framework that enables full host compromise via model metadata, creating immediate operational risk for practitioners. The public PoC and high CVSS score increase exploitability and urgency.
Practice with real Retail & eCommerce data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Retail & eCommerce problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.



