Xiaomi open sources its first native end-to-end speech model
2025-09-19 09:16:40

On September 19th, Xiaomi officially open-sourced its first native end-to-end speech model, Xiaomi-MiMo-Audio. Based on an innovative pre-training architecture and hundreds of millions of hours of training data, it achieved few-shot generalization in the speech domain using ICL for the first time, and observed significant "emergent" behavior during pre-training. MiMo-Audio significantly outperformed open-source models with the same number of parameters in multiple standard evaluation benchmarks, including general speech understanding and conversation, achieving a 7B best performance. On the standard test set of the audio understanding benchmark MMAU, MiMo-Audio surpassed Google's closed-source speech model, Gemini-2.5-Flash. In the Big Bench Audio S2T task, a benchmark for complex audio reasoning, MiMo-Audio also surpassed OpenAI's closed-source speech model, GPT-4o-Audio-Preview.
Email Subscription
Newsletters and emails are now available! Delivered on time, every weekday, to keep you up to date with North American business news.
ASIA TECH WIRE

Grasp technology trends

Download