On May 5, the Center for AI Standards and Innovation finalized agreements with three more frontier AI labs. The deals give the Commerce Department pre-release access to models with safety features stripped, allow testing inside classified environments, and route evaluations through an interagency task force convened in 2024.

The press release went up Tuesday afternoon on the National Institute of Standards and Technology homepage. Three new signatures: Google DeepMind, Microsoft, and xAI. With them, the US government now holds voluntary pre-release evaluation agreements with every major American frontier AI lab.

The other two were already in place. OpenAI and Anthropic signed in August 2024, when the same office was called the US AI Safety Institute. Tuesday's announcement closes a loop the Trump administration began in 2025, when Commerce Secretary Howard Lutnick reorganized AISI into the Center for AI Standards and Innovation, or CAISI, and renegotiated the older agreements under what the administration calls America's AI Action Plan.

The deal terms are unusual. Under all five agreements, the labs are expected to give CAISI access to frontier models before they ship publicly. CAISI then evaluates those models for national security risks tied specifically to cybersecurity, biosecurity, and chemical weapons capabilities. According to the NIST announcement, developers frequently provide CAISI with models that have reduced or removed safeguards, so evaluators can probe raw capabilities, not just what end users can elicit through guardrails.

Translation: the government now sees the unguarded version.

How Many Evaluations Have Already Happened

CAISI is not a new effort. The agency was a renaming and reorganization of an office that has existed inside NIST since late 2023. According to the NIST release, CAISI has now completed more than 40 evaluations of frontier AI models, including state-of-the-art systems that have not been publicly released.

That number is most of why the new signatures matter. CAISI is the only US government body with a multi-year track record of testing frontier AI models inside classified environments. The agreements with OpenAI and Anthropic gave it visibility into two of the five major labs. Tuesday's signatures extend that visibility to all five.

"Independent, rigorous measurement science is essential to understanding frontier AI and its national security implications. These expanded industry collaborations help us scale our work in the public interest at a critical moment." — Chris Fall, CAISI Director (NIST press release, May 5, 2026)

What the Task Force Actually Looks Like

The evaluations run through a body called the TRAINS Taskforce. The acronym stands for Testing Risks of AI for National Security. It was convened in November 2024 under the prior administration and has carried over. According to NIST, TRAINS pulls evaluators from across the federal government:

The Department of Defense
The Department of Energy, including the national laboratories
The Department of Homeland Security
The Department of Health and Human Services, via the National Institutes of Health

That mix is what makes the testing credible across multiple risk domains. NIH evaluators handle biosecurity. The national labs handle chemical and nuclear proliferation. DoD and DHS handle the cybersecurity work.

Industry's job under the agreements is to deliver test access on a defined timeline before each public model launch. The companies retain control of the model weights. CAISI delivers a written assessment back to the developer, often before any external red-team or bug-bounty effort runs.

The Scope Is Narrower Than It Sounds

For data scientists and ML engineers building on top of these models, the agreements have one immediate practical consequence. Starting now, the underlying models served through OpenAI, Anthropic, Google, Microsoft, and xAI APIs have all been fed through a federal national security evaluation before reaching the public.

That cuts both ways. On the upside, the most catastrophic capability concerns (synthesis of novel pathogens, untraceable network exploits, chemical weapon precursor design) will have been measured, with classified findings shared back to the developers. On the downside, the agreements are fully voluntary. No company is legally compelled to submit a model for review, and no public process exists for what happens if a CAISI finding flags a model as unsafe.

Here is what is and is not inside the framework as written:

Covered by CAISI agreements	Not covered
Cybersecurity capability (offensive cyber tools)	Bias and discrimination assessment
Biosecurity (pathogen synthesis, dual-use bio research)	Copyright and intellectual property
Chemical weapons capability	Misinformation generation
Pre-deployment access to unsafeguarded model variants	Deepfakes and impersonation
Post-deployment monitoring and targeted research	Labor displacement effects
Classified environment testing	Consumer privacy concerns

The narrowness is intentional. The Trump administration's AI Action Plan, published in 2025, moved AI policy away from the broader civil-rights and consumer-protection framing of the Biden-era executive order. CAISI's mandate as it stands tracks closely with what the AI Action Plan called demonstrable risks: cyber, bio, chemical, and nuclear.

The Other Side: Voluntary, Not Mandatory

The most consistent critique of the new framework is that voluntary agreements without enforcement leave the government in a weaker position than the public might expect.

Cybersecurity Dive, in its coverage on Wednesday, noted that no language in any published version of the agreements compels a developer to delay a model launch if CAISI flags a serious risk. The Hill quoted AI policy researchers who argued the framework substitutes structured access for actual rule-making, and that structured access depends on labs continuing to comply at a moment when commercial incentives push the other way.

Al Jazeera and the Washington Post both situated the deals inside a broader pattern in which AI safety policy in the United States is being made through voluntary commitments rather than statute. Congress has not passed a federal AI safety law. State-level AI rules remain in legal flux after the White House's preemption framework reshaped the landscape earlier this year.

The industry response has been generally favorable, in part because voluntary agreements suit the labs' commercial timelines better than statutory rules. Aaron Cooper, Senior Vice President of Global Policy at the Business Software Alliance, told reporters that CAISI brings the necessary expertise to evaluate frontier models for safety and national security risks. None of the five companies have publicly objected.

A second open question concerns what happens to the test data and the unsafeguarded model copies the government now holds. The NIST release does not specify retention policies, handling protocols inside classified facilities, or sunset clauses on the agreements.

Why the Labs Signed Anyway

For the labs, the agreements are not free. Each company commits engineering and safety-team time to preparing models for evaluation, often months before a public launch. Anthropic's pre-deployment work under the August 2024 agreement reportedly involved several weeks of staff time per model, and the company has continued the practice through its most recent releases.

The incentive to comply is largely reputational and procurement-driven. CAISI findings can flow to federal customers, including the Pentagon, which has signed AI procurement deals with eight major vendors this year. A vendor that has refused or failed CAISI evaluation looks worse on a federal RFP than one that has not.

The pattern holds across the five labs even where their commercial interests diverge. xAI has been a loud public skeptic of AI safety regulation and still signed. Microsoft, which is the most exposed to government procurement, signed the longest-term agreement. Google DeepMind, with the largest international footprint, signed terms similar to those of its US-only competitors.

The Bottom Line

In August 2024, the US government persuaded two AI labs to let it test their models before public release. In May 2026, that number became five. Every major American frontier AI developer is now under voluntary federal pre-deployment review.

The agreements do not have the force of statute. CAISI cannot block a launch. But they give the federal government something it did not have a year ago: a written, multi-lab record of what frontier AI models can do when their safety filters are stripped away. That record, classified and held inside NIST, is now the closest thing the country has to an AI capabilities baseline.

What it will take to turn that baseline into actual policy remains an open question. The next legislative session and the next state court ruling on AI preemption will both arrive before CAISI completes evaluations of the labs' next-generation systems.

Until then, the government has structured access. The labs control when and what they submit. And the models, evaluated or not, will keep shipping.

Sources

CAISI Signs Agreements Regarding Frontier AI National Security Testing With Google DeepMind, Microsoft and xAI (NIST, May 5, 2026)
Microsoft, Google and xAI will let the government test their AI models before launch (CNN Business, May 5, 2026)
Microsoft, Google, xAI giving government early access to AI models for review (The Hill, May 5, 2026)
Microsoft, Google, xAI give US access to AI models for security testing (Al Jazeera, May 5, 2026)
U.S. ramps up frontier AI testing as White House pivots toward safety (Axios, May 5, 2026)
NIST will test three major tech firms' frontier AI models for cybersecurity risks (Cybersecurity Dive, May 5, 2026)
Google, Microsoft and xAI's frontier AI to face national security testing (CIO Dive, May 5, 2026)
Commerce AI center will evaluate Google Deepmind, Microsoft and xAI models (Nextgov/FCW, May 5, 2026)
NIST will review new AI models from Google, Microsoft, xAI before release (Washington Post, May 5, 2026)

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

Used by DS/ML engineers at top companies

Active Search Campaigns by BudgetEasy

High CPC Clicks & Poor Landing PagesMedium

Campaign ROAS by Attribution ModelHard

250 free problems · No credit card

See all Ad Tech problems

Free Career Roadmaps8 PATHS

Step-by-step roadmaps from zero to job-ready — curated courses, salary data, and the exact learning order that gets you hired.

Explore all career paths

Recommended Reading

Curated articles related to this topic

News

10 min

Last Summer, Anthropic Promised Investors No Profits Until 2028. Q2 Numbers Are Tracking $559 Million.

Anthropic is on track for its first quarterly operating profit ever, projecting 10.9 billion dollars in Q2 revenue and 559 million in operating profit. Compute cost per revenue dollar fell from 71 cents to 56 cents in one quarter, a sharp reversal from the company's August 2025 guidance of no profits until 2028.

May 26, 2026

News

10 min

TrapDoor Hid Malicious Orders Inside CLAUDE.md and .cursorrules. The Attacker's Real Target Was Your AI Coding Assistant.

TrapDoor is the first widely-reported supply chain campaign that weaponizes AI coding assistants. Beyond credential theft via 384 versions across three package registries, the attacker hides instructions in CLAUDE.md and .cursorrules using zero-width Unicode, hoping the AI assistant runs commands the developer cannot see.

May 26, 2026

News

11 min

A Trusted VS Code Extension Was Poisoned for 18 Minutes. It Breached GitHub Itself.

A backdoored Nx Console extension was live on the Visual Studio Marketplace for just 18 minutes on May 18, but it was enough for the group TeamPCP to steal credentials at GitHub, OpenAI, Grafana Labs, and Mistral AI and exfiltrate roughly 3,800 of GitHub's internal repositories.

May 25, 2026

News

8 min

Anthropic's Co-Founder Just Put 60% Odds on AI Building Its Own Successor by 2028

At Oxford on May 20, Anthropic co-founder Jack Clark predicted AI would help win a Nobel Prize within 12 months and gave 60%-plus odds that an AI model could fully train its own successor by the end of 2028. He also repeated that the technology carries a non-zero chance of catastrophe.

May 25, 2026

News

7 min

OpenAI Is Filing for a $1 Trillion IPO. Its Own CFO Warned It Wasn't Ready

OpenAI is preparing to confidentially file a draft IPO prospectus with the SEC as soon as Friday, May 22, 2026, with Goldman Sachs and Morgan Stanley steering a public debut targeted for September at a valuation that could top $1 trillion. The move follows October 2025's conversion to a Public Benefit Corporation and February's record $122 billion round at an $852 billion valuation. CFO Sarah Friar has reportedly pushed back on the aggressive timeline, citing roughly $600 billion in five-year compute commitments against an annualized revenue run rate near $25 billion.

May 22, 2026

News

8 min

Anthropic Will Pay Musk $1.25 Billion a Month for a Supercomputer He Couldn't Fill

On May 20, 2026, SpaceX's S-1 filing disclosed that Anthropic will pay about $1.25 billion a month, more than $40 billion over three years, to rent the full 300-megawatt output of xAI's Colossus 1 data center near Memphis. The filing also revealed xAI lost $6.4 billion from operations in 2025 on $3.2 billion in revenue, and that Grok's daily active users fell from 13.9 million to 12.2 million between March and April. The deal is a textbook neocloud move: building AI infrastructure, then renting the idle capacity to a rival.

May 22, 2026

News

7 min

OpenAI's Last Math Claim Was an Embarrassment. This Time, the Skeptics Checked the Proof.

On May 20, 2026, OpenAI said a general-purpose reasoning model produced an original proof disproving Paul Erdős's planar unit distance conjecture, open since 1946. The model reached into algebraic number theory to find an infinite family of point arrangements beating the square grid. Unlike OpenAI's debunked October claim, this result was reviewed by mathematicians including Fields Medalist Tim Gowers and Thomas Bloom, the skeptic who exposed the earlier mistake.

May 21, 2026

News

9 min

Trump Spent Weeks Drafting an AI Order. Thursday, He Scrapped It at the Last Minute.

President Trump scrapped the signing of a landmark AI executive order on May 21, 2026, telling reporters he did not want to blunt America's lead over China. The order, in development for weeks after Anthropic's Mythos model autonomously found thousands of cyber vulnerabilities, would have created a voluntary framework for labs to share frontier models with the government 90 days before release. Even that compromise proved too much for an administration torn between China competition and AI safety.

May 21, 2026

News

7 min

Google's Flash Model Just Beat Its Own Flagship. The Real Target Is Your Agents.

Google launched Gemini 3.5 Flash at I/O 2026, a fast, cheap model that outperforms its flagship Gemini 3.1 Pro on nearly every benchmark. The release, paired with Antigravity 2.0 and Managed Agents in the Gemini API, marks Google's strategic shift from conversational AI to autonomous agents.

May 20, 2026

News

6 min

A Jury Took Less Than Two Hours to End Elon Musk's War on OpenAI

A nine-person jury in Oakland dismissed Elon Musk's lawsuit against Sam Altman, OpenAI, and Microsoft in under two hours, ruling his claims fell outside the three-year statute of limitations. The court never addressed the merits. Musk plans to appeal, but the verdict clears the largest legal threat to OpenAI's for-profit conversion.

May 20, 2026