Xiaomi open sources its first native end-to-end speech model

2025-09-19 09:16:40

On September 19th, Xiaomi officially open-sourced its first native end-to-end speech model, Xiaomi-MiMo-Audio. Based on an innovative pre-training architecture and hundreds of millions of hours of training data, it achieved few-shot generalization in the speech domain using ICL for the first time, and observed significant "emergent" behavior during pre-training. MiMo-Audio significantly outperformed open-source models with the same number of parameters in multiple standard evaluation benchmarks, including general speech understanding and conversation, achieving a 7B best performance. On the standard test set of the audio understanding benchmark MMAU, MiMo-Audio surpassed Google's closed-source speech model, Gemini-2.5-Flash. In the Big Bench Audio S2T task, a benchmark for complex audio reasoning, MiMo-Audio also surpassed OpenAI's closed-source speech model, GPT-4o-Audio-Preview.

AI OpenAI Xiaomi

Email Subscription

Newsletters and emails are now available! Delivered on time, every weekday, to keep you up to date with North American business news.

Weekly Highlights

                                    Musk: XAI will certainly raise funds in the next few months, but not now
                            2025-09-20

                                    Nvidia (NVDA.O) fell further, dropping nearly 3%.
                            2025-09-17

                                    SoftBank and OpenAI's AI joint venture in Japan is severely delayed, with an update now expected in November.
                            2025-09-18

                                    Alipay's two operating entities have completed the name change
                            2025-09-19

                                    Hande Information: The company has cooperation with Oracle
                            2025-09-23