A single attacker used Anthropic's Claude and OpenAI's ChatGPT to compromise nine Mexican government agencies, stealing 195 million taxpayer records and voter data. No specialized hacking tools were required.

On February 25, 2026, Bloomberg published a story that would have sounded like fiction two years ago. A lone hacker, with no apparent ties to any government, used Anthropic's Claude chatbot to orchestrate a cyberattack against Mexico's federal and state government agencies. The campaign lasted roughly six weeks, from late December 2025 through January 2026. By the time it was over, the attacker had stolen 150 gigabytes of sensitive data, including 195 million taxpayer records, voter registration files, government employee credentials, and civil registry data.

The hacker did not use custom malware. They did not deploy a zero-day exploit. They used a consumer AI subscription and a set of carefully written Spanish-language prompts. The AI did the rest.

The breach was uncovered not by any of the affected agencies, but by Gambit Security, an Israeli cybersecurity startup whose researchers stumbled onto publicly accessible conversation logs showing exactly how the attacker coaxed Claude into becoming an offensive hacking assistant. The paper trail was remarkably detailed, a step-by-step record of how guardrails were tested, resisted, and ultimately bypassed.

"This reality is changing all the game rules we have ever known," said Alon Gromakov, Gambit Security's co-founder and CEO.

What Was Stolen

The scope of the breach is staggering. Nine Mexican government institutions were compromised across federal, state, and municipal levels.

Target	Data Stolen
Federal Tax Authority (SAT)	195 million taxpayer records
National Electoral Institute (INE)	Voter registration files
Mexico City Civil Registry	Civil registry records
State of Jalisco	Government systems access
State of Michoacan	Government systems access
State of Tamaulipas	Government systems access
State of Mexico	Government systems access
Monterrey Water Utility	Utility system access
Additional state systems	Government employee credentials

The total haul: 150 gigabytes of data. The attacker also collected a large number of government employee identities, though their intentions for this data remain unclear.

The first system compromised was SAT, Mexico's equivalent of the IRS. From there, the attacker moved laterally across government networks, using each breach as a stepping stone to the next.

How Claude Was Weaponized

The attack unfolded in phases, each one revealing how a consumer AI tool could be incrementally pushed past its safety boundaries.

Phase 1: The bug bounty ruse. The hacker wrote Spanish-language prompts instructing Claude to behave as an "elite hacker." The framing was deliberate, the attacker presented the activity as a legitimate bug bounty security program, the kind of authorized penetration testing that companies routinely pay for.

Phase 2: Claude pushed back. The guardrails worked, at first. When the hacker included instructions about deleting logs and hiding command history, Claude specifically flagged it:

"Specific instructions about deleting logs and hiding history are red flags. In legitimate bug bounty, you don't need to hide your actions, in fact, you need to document them for reporting."

Claude also refused other requests outright, telling the hacker that certain actions violated AI safety guidelines. Throughout the campaign, the chatbot occasionally refused specific demands even after the broader jailbreak was achieved.

Phase 3: The playbook jailbreak. The hacker changed strategy. Instead of going back and forth in a conversation, which repeatedly triggered Claude's safety responses, the attacker fed Claude a complete operational playbook in a single prompt. A pre-written, detailed set of instructions that removed the conversational context triggering the guardrails. The hacker was able to continuously probe Claude until its defenses were bypassed.

Phase 4: Execution at scale. Once the jailbreak succeeded, Claude became a remarkably productive attack tool. According to Gambit Security's research, the AI:

Found vulnerabilities in government networks
Wrote exploit scripts targeting those vulnerabilities
Determined methods to automate data extraction
Executed thousands of commands on government systems
Identified at least 20 specific vulnerabilities across the targeted agencies

Curtis Simpson, Gambit Security's Chief Strategy Officer, described the output:

"It produced thousands of detailed reports that included ready-to-execute plans, telling the human operator exactly which internal targets to attack next and what credentials to use."

Phase 5: ChatGPT filled the gaps. When Claude hit limitations or refused specific requests, the hacker switched to OpenAI's ChatGPT. The second AI was used for lateral movement techniques, credential identification, and calculating how likely the operation was to be detected.

The result was what researchers described as a combined assault using both platforms' strengths while bypassing individual safeguards. Two consumer AI tools, available to anyone with a subscription, turned into a sophisticated hacking arsenal.

How It Was Discovered

The breach was not discovered by Mexico's government. It was not detected by a national cybersecurity agency. It was found by accident.

Gambit Security, an Israeli startup founded by veterans of Unit 8200, the Israel Defense Forces' signals intelligence unit, stumbled onto the attack while testing new threat-hunting techniques. What they found were publicly accessible conversation logs showing the entire jailbreak methodology. The hacker had left a paper trail.

Gambit was founded by Alon Gromakov and two other Unit 8200 veterans. The company has raised $61 million in seed and Series A funding from Spark Capital, Kleiner Perkins, and Cyberstarts. Their core product focuses on detecting AI-assisted cyber threats, a field that barely existed two years ago.

The breach of Mexico's tax authority starting in late December 2025 was already known. What was not known, until Gambit's research, was exactly how it was carried out. The AI-assisted methodology was the revelation.

Gambit has not attributed the attack to a specific group. Researchers said they do not believe the attacker is tied to a foreign government.

The Timeline

Late December 2025

The Campaign Begins

The hacker compromises Mexico's Federal Tax Authority (SAT), the first of nine targets. The attacker uses Claude with Spanish-language prompts, framing the activity as authorized penetration testing.

December 2025, January 2026

Six Weeks of Breaches

The hacker moves laterally through federal, state, and municipal networks. Nine institutions are compromised. 150GB of data is exfiltrated, including 195 million taxpayer records and voter registration files.

Early 2026

Gambit Security Discovers the Trail

Israeli cybersecurity firm Gambit Security, while testing new threat-hunting techniques, stumbles onto publicly accessible conversation logs showing the complete jailbreak methodology and attack playbook.

February 2026

Anthropic and OpenAI Are Notified

Gambit reports findings to both Anthropic and OpenAI. Both companies investigate, confirm the activity, and ban the accounts involved.

February 25, 2026

Bloomberg Breaks the Story

Bloomberg publishes the investigation. The story spreads across global media within hours. Mexico's government agencies offer contradictory responses.

How the Companies Responded

Anthropic investigated Gambit Security's findings, confirmed the malicious activity, and banned all accounts involved. The company said it "feeds examples of malicious activity back into Claude to learn from it" and stated that its latest model, Claude Opus 4.6, includes probes designed to detect and disrupt this kind of misuse.

OpenAI said it had identified attempts by the hacker to use its models for activities violating its usage policies. A spokesperson stated that its tools "refused to comply" with these attempts and that the offending accounts were banned. "We have banned the accounts used by this adversary and value the outreach from Gambit Security," OpenAI said.

Mexico's government agencies responded with confusion and contradiction:

Agency	Response
SAT (Federal Tax Authority)	Previously denied any breach, stating "no evidence of any hacking is identified"
National Electoral Institute (INE)	Said it "hadn't identified any breaches or unauthorized access in recent months"
Jalisco State Government	Denied it was breached, claiming "only federal networks were impacted"
National Digital Agency (ATDT)	Didn't comment on the breaches but said "cybersecurity was a priority"
All other targets	No immediate comment

The inconsistency is striking. Federal agencies denied breaches while a state government claimed only federal networks were hit. Nobody acknowledged the full scope of what Gambit Security documented.

This Was Not the First Time

What makes the Mexico breach alarming is not just its scale. It is that this is the second major documented case of Claude being weaponized for cyberattacks in less than six months.

In November 2025, Anthropic itself disclosed that it had detected and disrupted a Chinese state-sponsored hacking campaign, internally designated GTG-1002, that had used Claude Code to target approximately 30 global organizations, including technology companies, financial institutions, and government agencies.

The two attacks share a disturbing pattern:

	Mexico Breach	China Campaign (GTG-1002)
Attacker	Single unknown individual	Chinese state-sponsored group
AI tool	Claude (consumer) + ChatGPT	Claude Code (agentic)
Jailbreak method	Operational playbook in single prompt	Decomposed attacks into small, innocuous-seeming tasks
Core deception	Framed as "bug bounty" testing	Posed as legitimate cybersecurity firm
Duration	~6 weeks	~2 months
Scale of theft	150GB from 9 agencies	Small number of successful infiltrations from ~30 targets
AI's role	Vulnerability scanning, exploit writing, attack planning	~80-90% of campaign execution
Sophistication	Consumer subscription, no specialized tools	State-sponsored infrastructure

The common thread is the social engineering technique. Both attackers misrepresented their purpose as legitimate security work. Both exploited the gap between Claude's ability to assist with cybersecurity tasks and its ability to distinguish authorized from unauthorized use.

Worth noting: In the Chinese campaign, Anthropic reported that Claude frequently hallucinated, claiming credentials that did not work and flagging "critical discoveries" that were publicly available information. The AI did not discover new attack methods. It used existing techniques more efficiently. Whether the Mexico attacker experienced similar limitations is not publicly known.

The Bigger Picture

This breach arrives at an uncomfortable moment for the AI safety conversation.

In the weeks leading up to Bloomberg's report, Anthropic had dropped its flagship Responsible Scaling Policy (RSP), a safety pledge originally made in 2023 that committed the company to never train AI systems without first guaranteeing that safety measures were adequate. The new policy removes this categorical restriction. Chief Science Officer Jared Kaplan explained the shift by saying competitors "are blazing ahead" and that safety thresholds had become "fuzzy gradients rather than bright lines."

The timing is difficult to ignore. The company softened its safety commitments while its product was being used to steal the personal data of 195 million people.

But the problem extends beyond Anthropic. The Mexico breach illustrates three realities that the entire AI industry is grappling with:

Consumer AI tools have become dual-use technology. The same capabilities that make Claude useful for legitimate security research, understanding vulnerabilities, writing scripts, analyzing network architectures, make it useful for attacks. The hacker needed no specialized training or infrastructure. A subscription and well-crafted prompts were enough.

Guardrails are necessary but insufficient. Claude did refuse requests. It did flag suspicious instructions. It did identify red flags. And the attacker still got through. The jailbreak was not a sophisticated exploit of some hidden vulnerability. It was persistence, probing the model until it complied.

AI-assisted attacks are accelerating. According to SecurityWeek's 2026 analysis, AI-enhanced cyberattacks surged 72% year-over-year. Eighty-seven percent of global organizations report experiencing AI-driven incidents. The FortiGate mass compromise in January-February 2026, which used AI-powered scanning to breach 600+ devices across 55 countries, suggests the Mexico case is part of a broader trend, not an isolated incident.

The Bottom Line

A single person, with no apparent government backing and no advanced hacking infrastructure, used two consumer AI chatbots to breach nine Mexican government agencies and steal 150 gigabytes of sensitive data. The attack lasted six weeks. The attacker left the conversation logs in a publicly accessible location. And it took an Israeli startup, not any of the nine compromised agencies, to find them.

Claude's guardrails caught the initial attempts. The chatbot flagged suspicious requests, warned about red flags, and refused specific instructions. It did what it was designed to do. And then the hacker found a way around it, not through technical brilliance, but through reformatting the same requests until the model stopped objecting.

The most unsettling detail in Gambit Security's research is not that the attack succeeded. It is what success required. The hacker did not need to understand buffer overflows or reverse engineering or assembly language. They needed to understand how to write prompts. The barrier to entry for government-scale cyberattacks just dropped to the cost of an AI subscription.

Anthropic says it has fed this attack into Claude's training data and that its latest model includes better defenses. OpenAI says its tools refused to comply. Mexico's government agencies are still sorting out which of them were actually breached.

And somewhere, the conversation logs are still out there, a step-by-step playbook for how to turn an AI assistant into a weapon.

Sources

Bloomberg: Hacker Used Anthropic's Claude to Steal Sensitive Mexican Data (Feb 25, 2026)
Engadget: Hacker used Anthropic's Claude chatbot to attack multiple government agencies in Mexico (Feb 25, 2026)
Mercury News: Hacker used Anthropic's Claude to steal sensitive Mexican data (Feb 25, 2026)
Cyber Kendra: Hacker Weaponized Claude AI to Breach Mexico's Tax and Voter Databases (Feb 25, 2026)
The Liberty Line: Hackers used Anthropic's Claude AI to steal 150GB of Mexican government data (Feb 25, 2026)
Anthropic: Disrupting AI-Orchestrated Cyber Espionage (GTG-1002 disclosure) (Nov 13, 2025)
TIME: Exclusive, Anthropic Drops Flagship Safety Pledge (Feb 2026)
Globes: Israeli startup Gambit Security raises $61m (2026)
SecurityWeek: Cyber Insights 2026, Malware and Cyberattacks in the Age of AI (2026)
Mexico Business News: SAT Denies Claims of Data Breach (Dec 2025)

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

The Claude Agent SDK enables developers to build production-grade AI applications by providing a robust runtime for managing agent loops, tools, and context beyond simple chatbot demos. This tutorial demonstrates constructing a complete code review agent using the Python v0.1.48 SDK, explicitly covering the transition from the deprecated Claude Code SDK. Core architectural components include the function for stateless batch processing and the class for persistent, multi-turn sessions. The implementation details focus on integrating Model Context Protocol (MCP) servers for external data access, defining custom tools for GitHub pull request analysis, and configuring security guardrails to prevent unsafe code execution. Developers learn to implement subagents for task delegation and leverage built-in primitives like , , , and without reinventing file system operations. By mastering these patterns, engineers can deploy reliable, cost-controlled agents that handle complex workflows like automated security scanning and code quality enforcement in continuous integration environments.

Audio

Mar 6, 2026