MiMo-V2-Flash is a powerful 309B MoE language model that excels in reasoning, coding, and agentic scenarios while serving as an excellent general-purpose assistant. It delivers blazing-fast inference at 150 tokens per second with ultra-low costs.

MiMo-V2-Flash is a powerful, efficient, and ultra-fast foundation language model that particularly excels in reasoning, coding, and agentic scenarios, while also serving as an excellent general-purpose assistant for everyday tasks. It is a Mixture-of-Experts model with 309B total parameters and 15B active parameters, adopting a hybrid attention architecture that interleaves sliding-window and full attention.

The model supports a hybrid thinking mode allowing users to toggle whether the model thinks or answers instantly. It can generate functional HTML webpages with one click and works seamlessly with vibe-coding scaffolds such as Claude Code, Cursor, and Cline. MiMo-V2-Flash offers an ultra-long 256k context window enabling it to complete tasks across hundreds of rounds of agent interactions and tool calls. It delivers blazing-fast inference at 150 tokens per second while maintaining an ultra-low cost of $0.1 per million input tokens and $0.3 per million output tokens.

The model adopts a 1:5 hybrid of Global Attention and Sliding Window Attention with an aggressive 128-token sliding window and 5:1 hybrid ratio. It uses Multi-Token Prediction training to boost capabilities and validates MTP tokens in parallel during inference. The MTP block uses a dense FFN and SWA to limit parameter count and reduce KV cache costs, achieving an accepted length of 2.8–3.6 tokens and effective speedup of 2.0–2.6×.

MiMo-V2-Flash demonstrates strong reasoning ability, ranking among the top 2 open-source models in math competition AIME 2025 and scientific knowledge benchmark GPQA-Diamond. On SWE-bench Verified and Multilingual benchmarks for software engineering capability, it achieved the #1 spot among all open-source models and is on par with world's top closed-source models. The model scores 73.4% on SWE-Bench Verified and 71.7% on SWE-Bench Multilingual.

The model is built for reasoning, coding, and agentic scenarios and can become an assistant for everyday tasks. It is available globally on Hugging Face, API Platform, and AI Studio. Model weights including MiMo-V2-Flash-Base are available on Hugging Face under MIT license with inference code contributed to SGLang.

MiMo-V2-Flash

MiMo-V2-Flash

Key Features

Publisher

Use Cases

Who is this for?

Comments