Microsoft unveils MAI-Thinking-1 and new MAI models

At Build 2026, Microsoft AI CEO Mustafa Suleyman unveiled seven new in-house models, led by MAI-Thinking-1, the company's first reasoning model. Per Microsoft, it is a mid-sized sparse Mixture-of-Experts model with about 35 billion active and roughly 1 trillion total parameters and a 256,000-token context window, trained from scratch on commercially licensed enterprise data with no distillation from third-party models such as OpenAI's. Microsoft reports 97.0% on AIME 2025 and 94.5% on AIME 2026, says the model matches Claude Opus 4.6 on the SWE-Bench Pro coding benchmark, and says it was preferred over Claude Sonnet 4.6 in blind human evaluations run by its rating partner Surge. The launch also includes a multimodal family - MAI-Image-2.5, MAI-Voice-2, MAI-Transcribe-1.5 - and an efficient coding model, MAI-Code-1-Flash, now rolling out across GitHub Copilot and VS Code. MAI-Thinking-1 is in private preview in Microsoft Foundry, with partners such as Baseten hosting it.
What happened
At Build 2026, Microsoft AI CEO Mustafa Suleyman unveiled seven new first-party models, headlined by MAI-Thinking-1, Microsoft's first reasoning model. Microsoft describes it as a mid-sized model trained entirely in-house and positions it alongside a broader multimodal family and an efficient coding model. MAI-Thinking-1 is available in private preview through Microsoft Foundry, and partners including Baseten announced hosting.
The model
Per Microsoft, MAI-Thinking-1 uses a sparse Mixture-of-Experts (MoE) architecture with roughly 35 billion active parameters out of about 1 trillion total, activating only the components needed per request to control inference cost, and supports a 256,000-token context window. Microsoft says it was trained from scratch on commercially licensed enterprise data with no distillation from third-party models, explicitly including OpenAI's, and describes a repeatable internal pipeline it calls the "Hill-Climbing Machine" built on three pillars: capabilities learned without distillation, clean licensed pretraining data, and in-house end-to-end training infrastructure.
Benchmarks (Microsoft's claims)
Microsoft reports 97.0% on AIME 2025 and 94.5% on AIME 2026, math and multi-step reasoning tests. On the SWE-Bench Pro software-engineering benchmark, Microsoft says MAI-Thinking-1 matches Claude Opus 4.6, and in blind side-by-side human evaluations run by Surge, its independent rating partner, it says the model was preferred over Claude Sonnet 4.6. These are vendor-reported figures; independent replication is not yet available.
The wider family
Alongside the reasoning model, Microsoft Foundry added MAI-Image-2.5 (with image-to-image editing and preservation controls), MAI-Voice-2 (multilingual text-to-speech with voice cloning across more than 15 languages), and MAI-Transcribe-1.5 (expanded to 43 languages with content biasing). MAI-Code-1-Flash, an inference-efficient coding model tuned for GitHub, is rolling out across GitHub Copilot plans and VS Code. Microsoft says the models already power experiences across Copilot, Bing, PowerPoint and Azure Speech.
Why it matters
Editorial analysis: this is a strategically significant move. A company of Microsoft's scale shipping a first-party flagship reasoning model that it says is competitive with Anthropic's Claude line, and training it without third-party distillation, reduces its dependence on OpenAI and reshapes enterprise sourcing options. For developers already on Azure and Microsoft tooling, native Foundry access to integrated models can shorten deployment paths, while the MoE design and Flash variants target the cost-and-latency tradeoffs that dominate production inference.
What to watch
Editorial analysis: the central question is whether independent benchmarks and third-party evaluations confirm Microsoft's parity-and-preference claims against Claude and other frontier models, particularly on reasoning and software engineering. Also watch pricing and availability as MAI-Thinking-1 moves from private preview to general availability, deeper integration into GitHub Copilot, VS Code and M365, and how the licensed-data and no-distillation provenance claims hold up where compliance matters.
Caveats
The performance numbers, training-data provenance, and head-to-head comparisons are Microsoft's own statements, some surfaced at a launch keynote; they have not been independently verified. One aggregator framed the models as outright outperforming Claude, but Microsoft's own claims are narrower - parity with Opus 4.6 on SWE-Bench Pro and a human-evaluation preference over Sonnet 4.6.
Scoring Rationale
A major cloud and productivity vendor shipping its first in-house flagship reasoning model - trained without third-party distillation and, by Microsoft's account, competitive with Claude Opus 4.6 on coding and preferred over Sonnet 4.6 in human evals - alongside a multimodal family integrated across Copilot, GitHub and VS Code. It materially reshapes enterprise model sourcing and Microsoft's OpenAI dependence, placing it high in the major band; it stays below industry-shaking because the benchmarks are vendor-reported and claim parity rather than redefining the frontier.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems

