Xiaomi suddenly launched a new model, focusing on "extreme cost performance," Luo Fuli: "This is just our second step on the AGI roadmap."

Wallstreetcn
2025.12.17 02:49
portai
I'm PortAI, I can summarize articles.

$0.1 per million tokens, speed 150 tokens/second! Xiaomi suddenly launched a new model MiMo-V2-Flash, achieving a programming score of 73.4%, comparable to DeepSeek-V3.2. Luo Fuli stated, "This is just the second step on our AGI roadmap." Morgan Stanley analyzed that Xiaomi intends to deeply reshape its vast "people, vehicles, and home" ecosystem through this high-performance model

11 hours ago, Xiaomi launched and open-sourced its latest expert mixture of experts (MoE) large language model MiMo-V2-Flash in a "surprise" release late at night. The model has a total parameter count of 309 billion, with 15 billion active parameters, and is released under the developer-friendly MIT open-source license, with the foundational copyright already published on Hugging Face.

Fuli Luo, head of the Xiaomi MiMo team, clearly stated on social media: “MiMo-V2-Flash is now online. This is just the second step on our AGI roadmap.” This statement highlights Xiaomi's long-term planning and technological ambitions in the AI field.

From a market impact perspective, the entry of MiMo-V2-Flash may disrupt the existing competitive landscape of open-source AI models. Its officially announced extremely low costs of $0.1 per million input tokens and $0.3 per output token, combined with an inference speed of up to 150 tokens/second, provide developers and enterprises with an attractive option, which may accelerate the application and popularization of high-performance AI technology in broader scenarios, especially empowering its vast "mobile x AIoT" ecosystem.

Performance Comparable to DeepSeek-V3.2, and "Highly Cost-Effective"

MiMo-V2-Flash has demonstrated strong capabilities in multiple authoritative benchmark tests, with performance sufficient to compete with some top open-source and closed-source models.

According to data released by Xiaomi, in the SWE-bench Verified test measuring programming ability, MiMo-V2-Flash achieved a score of 73.4%, surpassing all known open-source models and approaching the level of top closed-source models.

In projects testing reasoning ability, such as the AIME 2025 mathematics competition and GPQA-Diamond scientific knowledge test, the model also ranked among the top two open-source models. Morgan Stanley's research report charts also show that MiMo-V2-Flash is competitive in overall performance compared to mainstream large models like DeepSeek-V3.2.

In increasingly important agent tasks, MiMo-V2-Flash also performs excellently. Data shows that it has achieved high scores in multiple categories such as communication, retail, and aviation in the τ²-Bench classification, proving its ability to understand complex task logic and execute multi-turn interactions.

Xiaomi has stated that the model's high performance, with an inference speed of 150 tokens/second and extremely low operating costs, makes it one of the highest cost-performance ratio high-performance models currently available. The model is currently available for a limited time for free on the API platform and has been released under the MIT open-source license on Hugging Face with basic copyright.

Technological Innovations Behind "Extreme Cost-Performance Ratio": Unlocking Efficiency and Long Text Capabilities

The reason MiMo-V2-Flash can achieve low cost and high efficiency while maintaining high performance is due to several key technological innovations in its model architecture and training methods.

First, there is the "Hybrid Sliding Window Attention". Xiaomi adopted a 5:1 hybrid ratio, where every 5 layers of Sliding Window Attention (SWA) are paired with 1 layer of global attention, reducing the storage capacity of the KV cache (a memory used to store intermediate results) by nearly 6 times while still supporting ultra-long context windows of up to 256k.

Fuli Luo shared engineering details in her X post: "We ultimately chose hybrid SWA. It is simple, elegant, and in our internal benchmarks, its long context reasoning ability outperformed other linear attention variants." She particularly pointed out a counterintuitive finding that a window size of 128 tokens is the "optimal choice," and blindly expanding to 512 would lead to performance degradation, emphasizing that "sink values are indispensable."

Secondly, there is Lightweight Multi-Token Prediction (MTP). This technology allows the model to predict multiple tokens in parallel at once, rather than generating them one by one as in traditional methods, thus increasing inference speed by 2 to 2.6 times.

Fuli Luo revealed: "With 3 layers of MTP, we observed an average acceptance of more than 3 tokens, with encoding task speed improved by about 2.5 times." She added that this technology effectively addresses the idle time issue of GPUs, and although it could not be fully integrated into the reinforcement learning (RL) loop due to tight project timelines, Xiaomi has open-sourced the 3-layer MTP for developers to use In November this year, Luo Fuli, who previously worked at DeepSeek, announced on X that she officially joined Xiaomi as the head of the MiMo team. Xiaomi MiMo is Xiaomi's core brand for advancing large model research and development. With Luo Fuli's announcement, Xiaomi MiMo has also been clarified, aiming for the forefront—spatial intelligence.

Training "Black Technology": Achieving Performance Alignment with 1/50 Computing Power

During the training phase, Xiaomi adopted industry-leading technology to maximize efficiency. The model used FP8 mixed precision technology during the pre-training phase, completing training on 27 trillion tokens of data.

More groundbreaking is the introduction of the Multi-teacher Online Policy Distillation (MOPD) framework in the post-training phase. According to Xiaomi, this method draws on the On-Policy Distillation concept from Thinking Machine, allowing the student model to receive dense reward signals from multiple expert teacher models during training. Its most significant advantage lies in efficiency, requiring only 1/50 of the computing power of the traditional method combining SFT (Supervised Fine-Tuning) and reinforcement learning to enable the student model to reach the performance peak of the teacher model.

Luo Fuli pointed out that this framework lays the foundation for building a "self-reinforcing feedback loop," where today's student model can evolve into a stronger teacher model tomorrow, thus achieving continuous and efficient iteration of the model.

Xiaomi's AI Landscape: From Smartphones to AGI

The release of MiMo-V2-Flash is not an isolated technical demonstration but an important part of Xiaomi's AI strategy. As Luo Fuli stated, this is just the "second step" in its AGI roadmap, hinting at more in-depth layouts to come.

This move clearly indicates that Xiaomi is fully committed to making AI one of its core competitive advantages. According to a report released by Morgan Stanley, this action "demonstrates Xiaomi's commitment to AI research and development," and it is expected that the company will achieve more substantial progress in both cloud AI and edge AI in the future. The strong self-developed AI underlying capabilities will bring unique intelligent experiences to its smartphones, IoT devices, and even new energy vehicles, building a deeper ecological moat.

Morgan Stanley believes that the launch of MiMo-V2-Flash may not only reshape the landscape of the open-source AI model market but also reveals Xiaomi's strategic ambition to deeply empower its "people, vehicles, and home" ecosystem through self-developed AI technology.

Fourteen years ago, Xiaomi redefined the flagship smartphone market with a price of 1999 yuan. Now, Xiaomi seems to hope that through MiMo-V2-Flash, it can bring a new "Xiaomi moment" to the open-source AI field with its outstanding performance and disruptive cost

Experience model can be accessed here: https://aistudio.xiaomimimo.com/#/