Security & Risksglangggufrcejinja2

SGLang Enables Remote Code Execution via Malicious GGUF Models

|April 21, 2026|By LDS Team

8.1

Relevance Score

SGLang Enables Remote Code Execution via Malicious GGUF Models — Photo: blogger.googleusercontent.com · rights & takedowns

A critical vulnerability, CVE-2026-5760, in SGLang allows malicious GGUF model files to achieve remote code execution on inference servers via the /v1/rerank endpoint. The flaw stems from unsandboxed template rendering, where jinja2.Environment() is used instead of ImmutableSandboxedEnvironment when processing a model-supplied tokenizer.chat_template. A public proof-of-concept demonstrates a weaponized GGUF hosting an SSTI payload that triggers SGLang's Qwen3 reranker path and executes arbitrary Python commands, enabling full host takeover. The vulnerability carries CVSS 9.8 severity and highlights a model metadata supply-chain risk; immediate mitigations include not loading untrusted GGUF models, disabling reranker functionality, or applying the patch to enforce sandboxed rendering.

What happened

A critical remote code execution vulnerability, CVE-2026-5760, was discovered in SGLang, the inference framework. An attacker-controlled GGUF model containing a malicious tokenizer.chat_template can trigger server-side template injection (SSTI) in the /v1/rerank endpoint and execute arbitrary Python on the host. The flaw is rated CVSS 9.8 and has a public proof-of-concept on GitHub.

Technical details

The vulnerable code path lives in python/sglang/srt/entrypoints/openai/serving_rerank.py. When SGLang processes model-supplied chat templates it creates a Jinja2 environment using jinja2.Environment() rather than the safer ImmutableSandboxedEnvironment. That unsandboxed environment allows SSTI payloads embedded in tokenizer.chat_template to access Python globals and run system commands. The PoC, authored by "Stuub" on GitHub, demonstrates a template that uses lipsum.__globals__["os"].popen(...) and a trigger phrase that activates the Qwen3 reranker path.

Attack mechanics

Practitioners should understand the exploit chain, which is concise and low-effort for an attacker:

•Host a weaponized GGUF model on a model repository such as HuggingFace.
•Trick a target into loading the model into SGLang, or target a shared inference platform that auto-loads community models.
•Send a request that exercises /v1/rerank; SGLang renders the model-supplied tokenizer.chat_template using jinja2.Environment().
•The SSTI payload escapes the template context and executes arbitrary Python, leading to full host compromise.

Context and significance

This is a textbook model-supply-chain vulnerability: untrusted model metadata is treated as code. The classes involved map to CWE-1336 and CWE-94, and the issue resembles prior SSTI-based RCE incidents in model runtimes, such as the "Llama Drama" family of vulnerabilities. The practical impact is high because many teams and vendors run inference stacks that load community models, and a single malicious GGUF can pivot to the underlying OS, steal keys, or move laterally in internal networks.

Mitigations and immediate actions

Developers and operators should assume hostile model metadata and apply defense-in-depth. Recommended steps are:

•Do not load untrusted GGUF models into production SGLang instances.
•Temporarily disable or firewall the /v1/rerank endpoint until a patch is applied.
•Patch the rendering path to use ImmutableSandboxedEnvironment or otherwise enforce a strict sandbox when rendering templates.
•Validate and sanitize tokenizer.chat_template fields at model import time, and implement metadata whitelisting.
•Run inference workloads in isolated containers or VMs with least privilege and no host filesystem access.

What to watch

Watch for an official SGLang security patch, vendor advisories such as VU#915947, and model-repository takedowns for weaponized GGUFs. Longer term, platform owners must add metadata validation and runtime isolation for community models to reduce supply-chain attack surface.

Key Points

1Unsandboxed Jinja2 rendering in SGLang lets malicious GGUF model metadata execute arbitrary Python, enabling host takeover.
2Attack requires only hosting a weaponized GGUF and a victim loading it, exposing wide supply-chain risk for inference platforms.
3Immediate defenses: do not load untrusted models, disable /v1/rerank, apply sandboxing patch, and enforce runtime isolation.

Scoring Rationale

This is a high-severity RCE in a widely used inference framework that enables full host compromise via model metadata, creating immediate operational risk for practitioners. The public PoC and high CVSS score increase exploitability and urgency.

Sources

Public references used for this report.

4 sources

01kb.cert.orgVU#915947 - SGLang is vulnerable to remote code execution when ...

02github.comGitHub - Stuub/SGLang-0.5.9-RCE: Proof of Concept exploitation of ...

03gbhackers.comMalicious GGUF Models Could Trigger Remote Code Execution on SGLang Servers

View 1 more source

04Hackers Could Weaponize GGUF Models to Achieve RCE on SGLang Inference Serversitsecuritynews.info

Practice with real Retail & eCommerce data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Prime/Platinum Customer SegmentsEasy

High-Value Orders Above $5KMedium

Return Rate by SellerHard

250 free problems · No credit card

See all Retail & eCommerce problems