2nd MLC-SLM Official Baseline System Released | US$20,000 Prize Pool Announced

From:-- Date: 05/19/2026

Registration is now in full swing for the 2nd Multilingual Conversational Speech Language Model Challenge (MLC-SLM Challenge 2026). Following the release of the training and development datasets, this year’s competition has reached a major milestone with the official release of the baseline system, enabling participating teams to conduct baseline reproduction, experimental validation, system development, and model optimization as the competition officially enters its technical implementation phase.

Competition Background

With the rapid advancement of Large Language Models (LLMs) and Speech Large Language Models (Speech LLMs), speech recognition and spoken language understanding are increasingly converging toward unified modeling. However, real-world multilingual conversational scenarios remain highly challenging, involving linguistic diversity, accent variations, speaker turn-taking, complex dialogue structures, and deep semantic understanding.

The MLC-SLM Challenge 2026 focuses on authentic multilingual conversational speech scenarios, aiming to advance Speech LLM research in areas such as speaker separation and recognition, acoustic understanding, and semantic understanding. It also provides an open evaluation platform for multilingual conversational speech language model research.

This year’s training dataset has been significantly expanded compared with the inaugural challenge, reaching approximately 2,100 hours and covering around 14 languages. It also includes a wider range of language variants and regional accents, such as Canadian French, Mexican Spanish, and Brazilian Portuguese, offering data support that better reflects real-world multilingual conversational speech applications.

Official Baseline System Released

The official baseline systems for both tasks in this year’s competition have now been released, enabling participating teams to quickly begin their experiments and optimization work.

Task 1: Multilingual Conversational Speech Speaker Separation and Recognition

The Task 1 baseline system is built on Microsoft’s open-source VibeVoice-ASR model and fine-tuned using the competition training dataset. For evaluation, the system uses the Meeteval toolkit to calculate the tcpMER metric. Specifically, tcpCER is used for Japanese, Korean, and Thai, while tcpWER is used for all other languages.

This baseline provides a reference workflow for speaker-related modeling and speech recognition in multilingual conversational speech scenarios.

Task 2: Multilingual Conversational Speech Understanding

The Task 2 baseline system first uses Gemini 2.5 Pro to generate multiple-choice questions for the training and development sets, covering both acoustic and semantic understanding. The ms-swift toolchain is then used to fine-tune the Qwen2.5-Omni-7B model.

The organizers will open-source the multiple-choice questions and corresponding answers for the development set as reference materials for participating teams. The evaluation set will be constructed in a similar multiple-choice format for speech understanding and will undergo human review before being used to determine the final Task 2 rankings.

Building on the official baseline systems, participating teams are encouraged to explore further optimization strategies across model architecture, training methods, data processing, cross-lingual generalization, accent robustness, and complex conversational understanding.

Prize Structure

The total prize pool for this competition is US$20,000, or the equivalent value in other currencies.

The prize breakdown for the top-performing teams in each task is as follows:

1st Place: US$5,000
2nd Place: US$3,000
3rd Place: US$2,000

Prize-winning rankings are available for both tasks, offering participating teams a global competitive platform to showcase their system capabilities, technical innovation, and research achievements.

Sustained Interest from Academia and Industry

The competition continues to attract strong participation from both academia and industry, including leading global companies and renowned academic institutions. The active participation of teams highlights the increasing interest in multilingual speech-language model technologies across both research and real-world applications.

Whether their work focuses on speech recognition, speaker diarization, speech understanding, multimodal large language models, or multilingual data and evaluation, participants can use the MLC-SLM Challenge 2026 as an open platform for technical validation, benchmarking, and exchange among researchers, engineers, and industry teams.

Join the Challenge

Researchers, engineers, academic teams, and industry participants are warmly invited to join the MLC-SLM Challenge 2026 and contribute to the advancement of multilingual conversational speech language models.

Official Website: https://www.nexdata.ai/competition/mlc-slm
Registration Link: https://forms.gle/jfAZ95abGy4ZiNHo7