Products & Toolscoding agentsmi moanthropic claudelong horizon

Xiaomi MiMo Code claims edge over Claude Code on 200-step tasks

|June 14, 2026|By LDS Team

6.8

Relevance Score

Xiaomi MiMo Code claims edge over Claude Code on 200-step tasks

Xiaomi's MiMo Code is the most direct open-source attack yet on Claude Code's long-horizon niche, and its architecture - not its benchmark table - is the part practitioners should study. Xiaomi released MiMo Code V0.1 in mid-June 2026 under an MIT license: a terminal-native coding agent forked from OpenCode, with a persistent memory system (project memory, session checkpoints, task-progress logs) maintained by an independent record-keeping subagent. In Xiaomi's own controlled experiment with both harnesses running the same MiMo model, MiMo Code scored 73% vs Claude Code's 68% on Terminal-Bench 2 and 62% vs 57% on SWE-Bench Pro; some launch coverage circulated larger gaps (86.7% vs 65.4%) from different configurations. Xiaomi also cites a 576-developer internal A/B evaluation where MiMo Code's win rate passed 65% beyond 200 steps. All figures are vendor-run and await independent replication.

Why this matters

The endurance problem - agents degrading as tasks stretch past dozens of steps - is the current frontier of practical coding-agent engineering. MiMo Code's design answers it with engineering rather than model scale: a dedicated subagent owns record-keeping, checkpointing state and summarizing context before the window fills, so the main agent never starts from scratch. That pattern is portable to any agent framework, which makes this release worth attention even from teams that will never run a Xiaomi model.

What happened

Xiaomi released and open-sourced MiMo Code V0.1 under an MIT license, a terminal-native AI coding agent built as a fork of the open-source OpenCode project. Launch coverage from The New Stack and VentureBeat dates the release to June 10-11, 2026; Xiaomi's official announcement page describes the release and its architecture. The agent installs via a single terminal command or npm on Windows, ships with Xiaomi's MiMo-V2.5 model built in (free for a limited time, and described by Xiaomi as comparable to Claude Sonnet 4.6), and supports third-party models including DeepSeek, Kimi, and GLM.

The benchmark numbers - read them carefully

Xiaomi's official announcement reports a controlled experiment in which MiMo Code and Claude Code ran the exact same MiMo model, isolating the harness itself: MiMo Code scored 73% vs Claude Code's 68% on Terminal-Bench 2, and 62% vs 57% on SWE-Bench Pro. Some launch coverage circulated larger figures (86.7% vs 65.4% on Terminal-Bench 2.0, and 57.2% on SWE-bench Pro) drawn from different configurations; accounts differ, and the official controlled-experiment numbers are the conservative, like-for-like comparison. Separately, Xiaomi cites an internal beta with a human A/B evaluation across 576 developers working in real private repositories, where MiMo Code's win rate against Claude Code rose above 65% once tasks passed 200 execution steps. Every figure here is vendor-run; none has been independently replicated yet.

Technical details

The memory system uses layered persistence - project memory, conversation checkpoints, and per-task progress logs - and VentureBeat's teardown documents an implementation built on SQLite FTS5 full-text search, including a scratch-notes layer. An independent subagent saves state automatically and produces a clean summary when the context window nears capacity, so the main agent continues instead of restarting. A /dream command runs every seven days, merging, deduplicating, and compressing accumulated memory into a compact current state. A Compose mode (triggered with the Tab key) runs a full design-plan-code-test-review loop from a single instruction.

Context and significance

Harness-versus-harness competition is displacing model-versus-model comparison in coding agents; GitHub's recent Copilot token-efficiency work points the same direction. An MIT-licensed, documented memory stack lowers the cost of experimenting with these ideas, and OpenCode's permissive base makes vertical forks straightforward. The obvious caveats: vendor benchmarks routinely flatter the vendor, Terminal-Bench and SWE-Bench Pro configurations materially affect scores, and long-horizon win rates from an internal beta are not a public benchmark.

What to watch

•Independent Terminal-Bench 2 runs pitting MiMo Code against Claude Code and Codex CLI under matched configurations
•Community adoption of the memory-subagent pattern in other OpenCode forks and agent frameworks
•Whether Xiaomi publishes the evaluation harness behind its 576-developer A/B study

Key Points

1Xiaomi open-sourced MiMo Code V0.1 (MIT license), an OpenCode fork with subagent-maintained persistent memory targeting 200-plus-step coding workflows.
2Xiaomi's same-model controlled runs show 73% vs 68% on Terminal-Bench 2 and 62% vs 57% on SWE-Bench Pro; larger circulated gaps reflect different configurations.
3The transferable idea is outsourcing memory to a dedicated subagent rather than relying on the model to take notes - worth testing in any agent stack.

Scoring Rationale

Notable MIT-licensed coding-agent release whose official controlled experiment (same model in both harnesses) shows a consistent harness advantage: 73 vs 68 on Terminal-Bench 2 and 62 vs 57 on SWE-Bench Pro. Directly relevant to practitioners building agentic tooling, but all figures are Xiaomi-run, larger circulated gaps come from differing configurations, and independent replication is pending, so the score stays at 6.8.

MoreAI Agents news

Sources

Public references used for this report.

8 sources

mimo.mi.comMiMo Code Released and Open-Sourced | Model Agent Collaborative Optimization, Stepping Towards the Self-Evolution Era

mimo.xiaomi.comMiMo Code: Scaling Coding Agents to Long-Horizon Tasks

thenewstack.ioXiaomi's MiMo Code claims it beats Claude Code past 200 steps

View 5 more sources

Practice interview problems based on real data

1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

Why this matters

What happened

The benchmark numbers - read them carefully

Technical details

Context and significance

What to watch

•Independent Terminal-Bench 2 runs pitting MiMo Code against Claude Code and Codex CLI under matched configurations
•Community adoption of the memory-subagent pattern in other OpenCode forks and agent frameworks
•Whether Xiaomi publishes the evaluation harness behind its 576-developer A/B study

Key Points

1Xiaomi open-sourced MiMo Code V0.1 (MIT license), an OpenCode fork with subagent-maintained persistent memory targeting 200-plus-step coding workflows.

2Xiaomi's same-model controlled runs show 73% vs 68% on Terminal-Bench 2 and 62% vs 57% on SWE-Bench Pro; larger circulated gaps reflect different configurations.

3The transferable idea is outsourcing memory to a dedicated subagent rather than relying on the model to take notes - worth testing in any agent stack.

Scoring Rationale

Xiaomi MiMo Code claims edge over Claude Code on 200-step tasks

Why this matters

What happened

The benchmark numbers - read them carefully

Technical details

Context and significance

What to watch

Key Points

Scoring Rationale

Sources

More AI & Data Science News

BCG Ranks Singapore 11th and Hong Kong 36th in Intelligent Cities Index

Review Maps 21 Studies of Generative AI Mental Health Chatbots

Michaels Launches Gemini-Powered Ask Mike Assistant

Researchers Generalize Temporal Difference Error Modeling

Xiaomi MiMo Code claims edge over Claude Code on 200-step tasks

Why this matters

What happened

The benchmark numbers - read them carefully

Technical details

Context and significance

What to watch

Key Points

Scoring Rationale

Sources

More AI & Data Science News

BCG Ranks Singapore 11th and Hong Kong 36th in Intelligent Cities Index

Review Maps 21 Studies of Generative AI Mental Health Chatbots

Michaels Launches Gemini-Powered Ask Mike Assistant

Researchers Generalize Temporal Difference Error Modeling