What happened
Kaspersky published a detailed writeup on May 12, 2026, documenting attempts to hijack privately hosted LLM servers. Per Kaspersky, the author deployed an internet-facing honeypot on a Raspberry Pi that mimicked common local-serving stacks including Ollama, LM Studio, AutoGPT, LangServe, and text-gen-webui, and advertised a local instance of the model Qwen3-Coder 30B Heretic. According to Kaspersky, the internet scanner Shodan discovered the honeypot within 3 hours; recon-like requests began within 1 hour. Over the following month the honeypot logged more than 113000 requests from thousands of unique IPs, with 23% of traffic focused on discovering AI capabilities and exploiting local LLMs or agents.
Technical details
Editorial analysis
Kaspersky's honeypot setup reportedly served pre-saved model responses and exposed OpenAI-compatible API endpoints, which is now a common surface for attackers because OpenAI-format APIs are widely used. The article also notes the honeypot advertised ancillary resources such as RAG databases and an MCP endpoint exposing a _get_credentials_-like capability, elements attackers probe to escalate from compute-hijacking toward data or credential theft.
Context and significance
The documented activity illustrates two converging trends practitioners should note: easy-to-deploy local model stacks are increasingly exposed to the open internet, and automated scanners rapidly enumerate and test those endpoints. The Kaspersky data point that nearly one quarter of requests were capability probing underscores that attackers are not just opportunistic scanners but are looking specifically for model-serving behaviour and exploitable agent workflows.
Mitigations outlined
Kaspersky recommends treating private model-serving infrastructure with the same hardening applied to production servers, including restricting internet exposure, enforcing authentication on APIs, network segmentation, inventorying ancillary services (RAG, agents, MCP), and monitoring for reconnaissance patterns. Kaspersky frames these as deployment-day priorities rather than optional post-deployment steps.
What to watch
For practitioners
watch for increased scanning signatures targeting OpenAI-style endpoints, monitor unusual volumes of short capability-probing requests, and prioritize access controls and telemetry on any host that advertises model-serving APIs. Public scanning services like Shodan can surface your exposed endpoints rapidly; defenders should test visibility with controlled honeypots and adjust perimeter rules accordingly.
Key Points
- 1Honeypot data from Kaspersky shows rapid, large-scale probing of exposed local model servers, with discovery within hours.
- 2Attackers focus on OpenAI-format APIs and ancillary services (RAG, agent endpoints), increasing the attack surface beyond models themselves.
- 3Industry practitioners should treat private model servers like production infrastructure: minimize internet exposure and enforce authentication and monitoring.
Scoring Rationale
The story documents a widespread, automated threat against privately hosted model servers with concrete honeypot telemetry, making it a notable operational risk for practitioners running local stacks. It is not a paradigm-shifting research result, but it has immediate operational relevance.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

