Xiaomi MiMo Code claims edge over Claude Code on 200-step tasks

Multiple outlets including The New Stack, VentureBeat, and CryptoBriefing report that Xiaomi's MiMo team released MiMo Code V0.1, an open-source, terminal-native coding agent, on June 10-11, 2026 under an MIT license. Xiaomi's benchmarks show MiMo Code scoring 86.7% on Terminal-Bench 2.0 versus Claude Code's 65.4%, and 57.2% on SWE-bench Pro, outperforming Claude Opus variants. A human A/B evaluation of 576 developers found MiMo Code's win rate exceeding 65% at 200 execution steps, using 40-60% fewer tokens. The agent is built on MiMo-V2.5-Pro, a 1.02-trillion-parameter MoE model with a 1-million-token context window, and extends the open-source OpenCode agent with a four-layer cross-session memory using SQLite FTS5. All performance figures are Xiaomi-reported and await independent verification.
What Happened
The New Stack and VentureBeat report that Xiaomi's MiMo team released MiMo Code V0.1, a terminal-native AI coding agent, on June 10-11, 2026 under an MIT license on GitHub. Per VentureBeat and Xiaomi's blog post, the agent installs via a single terminal command or npm on Windows. VentureBeat reports the announcement referenced an internal beta and a human A/B evaluation of 576 developers; past 200 execution steps, MiMo Code's reported win rate exceeded 65%.
Benchmark Results (Xiaomi-Reported)
CryptoBriefing reports that on Terminal-Bench 2.0, designed to stress AI coding agents on complex multi-step terminal tasks, MiMo Code scored 86.7% versus Claude Code's 65.4% - a 21-percentage-point gap. On SWE-bench Pro, which tests resolution of real-world GitHub issues, MiMo Code achieved 57.2%, outperforming Claude Opus variants according to CryptoBriefing. Across agentic scenarios, MiMo Code reportedly used 40-60% fewer tokens than Claude Code, per CryptoBriefing. All these figures are from Xiaomi's own published results and have not been independently replicated.
Technical Details (Reported)
VentureBeat reports MiMo Code is a fork of the open-source OpenCode agent extended with a memory architecture, workflow modes, and a model harness. VentureBeat documents a four-layer cross-session memory implemented with SQLite FTS5 full-text search: project memory (a persistent MEMORY.md), session checkpoints, scratch notes, and per-task progress logs. The model powering the agent is MiMo-V2.5-Pro, a mixture-of-experts (MoE) architecture with approximately 1.02 trillion parameters and a 1-million-token context window, per CryptoBriefing. Notably, CryptoBriefing reports that MiMo Code's terminal also integrates Claude Code via API, so developers can route subtasks to Claude without leaving the MiMo environment.
Editorial Analysis - Technical Context
Observed patterns in similar agentic systems show that long-horizon workflows stress both context-window limits and state-management. A persistent memory stack combined with FTS5 retrieval is a common mitigation: it reduces the volume of context that must pass through the model on each call, which explains the reported token savings. The Terminal-Bench 2.0 result is corroborated by multiple independent outlets but originated from Xiaomi's evaluation suite; until third parties run the same tasks against both tools, the gap cannot be treated as a settled performance claim.
Context and Significance
Benchmark competitions between agentic coding tools are accelerating. Observed patterns: vendor-run evaluations on curated task sets can show large gaps that narrow under community or academic scrutiny. For practitioners, the combination of an open-source MIT-licensed harness plus a documented memory architecture and large context model lowers the barrier to experiment. The explicit Claude Code API integration within MiMo terminal is also worth noting - it signals a strategy of compatibility with incumbent tools rather than pure replacement.
What to Watch
- •Independent evaluations replicating Xiaomi's Terminal-Bench 2.0 and SWE-bench Pro results using standardised task sets.
- •Community uptake and forks on GitHub that test MiMo-V2.5-Pro across diverse languages and real-world repos.
- •Token and latency trade-offs when combining a 1-trillion-parameter MoE model with SQLite FTS5 memory in CI environments.
Scoring Rationale
Notable open-source coding-agent release with documented benchmark comparisons (Terminal-Bench 2.0: 86.7% vs 65.4%; SWE-bench Pro: 57.2%) and a large-context MoE model - directly relevant to AI/DS/ML practitioners evaluating agentic tooling. Score held at 6.8 because all performance figures are Xiaomi-reported and independent verification is pending; confirmed independent results or community replication would push this higher.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems


