MAI-Thinking-1 is a 35-billion-active-parameter mixture-of-experts model that Microsoft says matches Claude Opus 4.6 on a leading coding benchmark and was preferred over Claude Sonnet 4.6 by human raters. The company trained it from the ground up with no distillation from other models, the flagship of a new seven-model family. Microsoft has poured 13 billion dollars into OpenAI; now it is building the models meant to need it less.

On Tuesday, June 2, Mustafa Suleyman walked onto the stage at Microsoft Build and presented seven AI models his team had built from scratch. The flagship was a reasoning model called MAI-Thinking-1, and the most interesting thing about it was not a benchmark score. It was how Microsoft chose to train it.

The fastest way to build a strong model in 2026 is to distill one. You take a frontier system, generate millions of its answers, and train a smaller model to imitate them. It is cheaper, quicker, and it works. Microsoft, which has access to OpenAI's models and Anthropic's, did not do that. MAI-Thinking-1 was "trained from the ground up on enterprise grade, clean and commercially licensed data, without distillation from third-party models," the company wrote. The whole point of the model is that it learned, rather than copied.

For the company that has leaned on other labs' models more than any other, that is a statement of intent. Microsoft has put 13 billion dollars into OpenAI across multiple rounds, making it the ChatGPT maker's largest backer. The model it unveiled on Tuesday is the clearest evidence yet that it would rather not have to.

The Reasoning Model That Refused the Shortcut

Suleyman's team frames its work around what it calls a Hill-Climbing Machine: a training pipeline designed so that every component, from data to rewards to compute, can be improved continually instead of in one-off leaps. Three principles hold it together, and each one is a quiet argument against the way most labs build.

The first is that capabilities should be learned, not inherited. Microsoft's reasoning is that an imitator is "fundamentally tied to the design choices of its teacher and struggles to adapt to new situations." A distilled model inherits a ceiling. The second is clean data: MAI-Thinking-1 was trained on appropriately licensed material with AI-generated content deliberately excluded from pre-training, on the logic that "if we cannot account for what shaped a model, we cannot fully understand its behavior." The third is self-sufficiency across the stack, from co-designing the models with Microsoft's own accelerators to running an in-house reinforcement learning framework.

That last pillar is the one that connects a technical choice to a corporate strategy. Owning the data, the training infrastructure, and the silicon means owning the model outright, with no dependency on a partner who might also be a rival.

The Benchmarks Microsoft Is Claiming

MAI-Thinking-1 is a sparse mixture-of-experts model with 35 billion active parameters out of roughly 1 trillion total, meaning only a fraction of the network fires on any given token. That keeps the inference footprint, and the cost, closer to a mid-size model than its total size suggests. It ships with a 256,000-token context window, enough to hold a 600-page document, plus function calling and compatibility with the widely used Chat Completions API.

The headline results, all reported by Microsoft from its own evaluations:

Benchmark	MAI-Thinking-1 result	What it measures
AIME 2025	97.0%	Competition-level math reasoning
AIME 2026	94.5%	Competition-level math reasoning, newest set
SWE-Bench Pro	Toe-to-toe with Claude Opus 4.6	Real-world software engineering tasks
Blind human preference	Preferred over Claude Sonnet 4.6	Helpfulness across 1,276 tasks

The human-preference number is the one Microsoft leans on hardest. It ran a blind side-by-side evaluation with its partner Surge, using professional raters across 1,276 tasks spanning single-turn and multi-turn conversations, and reports that people preferred MAI-Thinking-1's answers to Claude Sonnet 4.6's. Matching Opus 4.6 on coding while staying in a smaller weight class is the part that matters for practitioners, because model size decides where a coding assistant can actually run and how often a team can afford to call it.

If you want the plain-English version of why a "reasoning model" is treated as a category of its own, our guide to how reasoning models learned to think step by step walks through the mechanics.

Seven Models, One Message

MAI-Thinking-1 did not arrive alone. The Microsoft AI Superintelligence Team shipped a family of seven in-house models spanning image, voice, transcription, coding, and reasoning. The ones developers will meet first:

MAI-Code-1-Flash, a lightweight 5-billion-parameter coding model now rolling out inside Visual Studio Code and GitHub Copilot, built for fast agentic edits rather than heavyweight reasoning.
MAI-Image-2.5, an image model Microsoft says ranks second on a leading image-editing leaderboard, ahead of Google's Nano Banana Pro.
MAI-Transcribe-1.5, a speech-to-text model aimed at turning noisy audio into accurate, domain-specific transcripts.

MAI-Thinking-1 is available in private preview today on Microsoft Foundry, the same platform where Microsoft hosts OpenAI's models and Anthropic's, including the recently shipped Claude Opus 4.8. A public preview on Microsoft's MAI Playground is promised soon. For now, the company's own model sits on the shelf next to the partners it is trying to depend on less.

Why Microsoft Needs Models It Owns

The strategy makes sense once you map the alliances. Microsoft is OpenAI's largest investor. It also committed up to 5 billion dollars to Anthropic last year and wired Claude into its Copilot products. On paper, it has the best models in the world a phone call away.

The problem is that those phone calls increasingly reach competitors. Anthropic is backed by Google and Amazon, both direct Microsoft rivals in cloud. OpenAI has grown closer to Amazon, landing its models on Amazon Bedrock one day after its Microsoft exclusivity ended. Microsoft and OpenAI renegotiated their partnership earlier this year to loosen the exclusivity that once bound them, and the practical result is that the company built its AI business on models it does not control, sold by partners with their own agendas.

"This is all about long term self-sufficiency for Microsoft and our partners," Suleyman wrote in the post announcing the models. "It's about models you can trust." Microsoft wraps the effort in a philosophy it calls Humanist Superintelligence, advanced AI meant to "serve people and organizations, not replace them," with a pointed line that its models "must not refuse legitimate requests under the guise of safety and compliance." That is a swipe at competitors' reputation for over-refusal, and a pitch to enterprises that want a model they can steer.

The Other Side

Every number above comes from Microsoft. The benchmark figures are pulled from the company's own model card, the human-preference test was run with a Microsoft partner, and MAI-Thinking-1 has not yet appeared on the independent public leaderboards where rivals are scored head to head. Until it does, "matches Claude Opus 4.6" is a vendor claim, not a verified ranking.

There is also a timing wrinkle worth naming. Opus 4.6 is the bar Microsoft chose, and Opus 4.6 is no longer Anthropic's best model. Anthropic shipped Opus 4.8 on May 29, days before Build, and OpenAI and Google both sit at GPT-5.5 and Gemini 3.1 territory. MAI-Thinking-1 is a credible mid-weight reasoning model that draws even with a system from a generation ago, which is a real achievement for a first attempt and still a step behind the current frontier. The "no distillation, clean data" claims, meanwhile, are difficult for anyone outside Microsoft to verify.

And one model, however principled, does not undo a 13-billion-dollar entanglement overnight. Microsoft will keep paying for OpenAI and Anthropic models because its customers want them. Self-sufficiency is a direction, not a destination Microsoft has reached.

The Bottom Line

The substance of Tuesday's announcement is narrower than the strategy behind it. MAI-Thinking-1 is a solid medium-size reasoning model that, by its maker's own numbers, trades blows with a frontier model from last quarter. That alone would be a footnote in a year crowded with model releases.

What makes it matter is the choice it represents. Microsoft had every shortcut available and took the long road on purpose, building a model it fully owns rather than one it borrowed. The bet is that in an industry where today's partner is tomorrow's competitor, the most valuable model is not the smartest one. It is the one you do not have to ask permission to use.