Researchers Hijack AI Coding Agents, Steal Credentials

Security researchers demonstrated a prompt injection pattern that hijacks AI coding agents integrated with GitHub Actions to exfiltrate API keys and access tokens. The attack targeted Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and Microsoft's GitHub Copilot, with payloads embedded in pull request titles, issue bodies, and comments. Vendors quietly paid bug bounties, including $100 from Anthropic and $500 from GitHub, but did not publish CVEs or public advisories, leaving pinned or older deployments exposed. The flaw is systemic: agents that ingest repository metadata as task context can be coerced to reveal secrets from the runner environment or to post sensitive outputs as PR comments. Teams should audit pipelines, restrict permissions, and apply input-sanitization and secret-scoping patterns immediately.
What happened
Researchers from Johns Hopkins, led publicly by Aonan Guan, exploited a prompt injection pattern to hijack AI coding agents that run inside GitHub Actions. They successfully coerced Anthropic's Claude Code Security Review, Google's Gemini CLI Action, and Microsoft's GitHub Copilot into leaking API keys, GitHub access tokens, and other secrets available to the Actions runner. Vendors paid small bug bounties, including $100 from Anthropic and $500 from GitHub, with an undisclosed payment from Google. None of the vendors issued CVEs or public advisories, which Guan warned could leave pinned or unpatched users unaware: "If they don't publish an advisory, those users may never know they are vulnerable - or under attack."
Technical details
The attack is an indirect prompt injection that abuses how agents ingest repository data as task context. The malicious workflow looks like ordinary PR metadata, but embeds instructions that override or augment agent directives. The agent then executes the injected instruction, includes sensitive values in a structured response, and that response is posted back to the repo as a comment or artifact, producing public leakage.
- •Affected integrations read PR titles, issue bodies, and comments as context and do not reliably separate untrusted content from control instructions
- •Exfiltration vectors include JSON outputs posted as PR comments and any text that is captured by the agent and forwarded to downstream systems
- •Runner-level secrets and API keys in the environment can be incorporated into agent outputs if the agent queries or echoes environment state
Practical mitigations
This is a workflow and design failure, not just a model bug. Apply least-privilege to Actions runners and tokens, rotate credentials frequently, and never expose long-lived secrets in runner environments. Sanitize all untrusted repository inputs before passing them to agent prompts. Harden agent prompt architecture with explicit system messages that deny execution of instructions found in repository content, and prefer cryptographic metadata or signed attestations for trusted inputs. Consider isolating agent execution, using ephemeral tokens scoped to single operations, and adding secret-detection rules that prevent posting structured outputs containing high-entropy strings.
Context and significance
Autonomous agents are being granted system-level permissions to read and act on developer workflows. This incident is a clear, practical demonstration of the broader prompt injection threat model documented in research but now weaponized against production CI/CD integrations. The vendors' choice to resolve quietly via bounty payments without public advisories increases the exploitation window for pinned installations. Some outlets reported additional claims about mass-installation attacks using different agents and tooling; while details vary, the underlying failure mode is consistent and pervasive across agent architectures that trust repository content.
What to watch
Expect public advisories, CVEs, and patched action releases as pressure mounts. Security teams should immediately audit Actions workflows, rotate exposed credentials, and implement the mitigations above. Long term, platform changes that default to minimal permissions for actions and that provide explicit trust tokens for agent inputs will be necessary to reduce this class of risk.
Scoring Rationale
This is a high-impact security finding because it affects major vendor integrations and enables credential theft from CI/CD pipelines. The vendors' lack of public advisories increases exploit risk and makes immediate practitioner action necessary.
Practice with real Ad Tech data
90 SQL & Python problems · 15 industry datasets
250 free problems · No credit card
See all Ad Tech problemsStep-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.