Challenges of Code-switch Speech Recognition

From：Nexdata Date： 2024-08-15

➤ Speech recognition with multi - languages

The era of data-driven artificial intelligence has arrived. The quality of data directly affects the effectiveness and intelligence of the model. In this wave of technological change, datasets in various vertical fields are constantly emerging to meet the needs of machine learning in different scenarios. Whether it is computer vision, natural language processing or behavioral analysis, various datasets contain huge commercial value and technical potential.

Speech recognition technology enables computers to understand human speech, thus supporting a variety of voice interaction scenarios, such as mobile phone applications, human-vehicle collaboration, robot dialogue, voice transcription, etc.

➤ Code - switch speech recognition

However, in these scenarios, the input for speech recognition is not always a single language, and sometimes there is a mixture of multiple languages. For example, in Chinese scenes, we often use some English terminology to express meaning, which brings new challenges to speech recognition technology.

Code-switch Speech Recognition Challenges

1. Similar pronunciation

Chinese and English speech recognition requires a single model to learn multiple speech sounds, and pronunciations that are similar but have different meanings usually lead to increased model complexity and computation. Since it needs to distinguish and process similar pronunciations in different languages, it is necessary to distinguish different modeling units according to different languages when modeling the model.

2. Scarce Training Data

➤ Mixed - language speech data

Chinese-English mixed data is less than single-language data. At present, open source Chinese speech recognition data sets such as WenetSpeech and English speech recognition data set Giga Speech have reached the 10,000-hour level, but the mixed open source Chinese and English speech recognition data are only SEAME and TAL_CSASR two open source data.

Nexdata Code-switch Speech Recognition Data Solutions

1,535 Hours - Mixed Speech with Chinese and English Data by Mobile Phone

The 1,535 Hours - Mixed Speech with Chinese and English Data by Mobile Phone is recorded by 3972 Chinese native speakers with accents covering seven major dialect areas. The recorded text is a mixture of Chinese and English sentences, covering general scenes and human-computer interaction scenes. It is rich in content and accurate in transcription. It can be used for improving the recognition effect of the speech recognition system on Chinese-English mixed reading speech.

303 Hours - Mixed Speech with Chinese and English Data by Mobile Phone

The data is recorded by 1113 Chinese native speakers with accents covering seven major dialect areas. The recorded text is a mixture of Chinese and English sentences, covering general scenes and human-computer interaction scenes. It is rich in content and accurate in transcription. It can be used for improving the recognition effect of the speech recognition system on Chinese-English mixed reading speech.

300 Hours - Mixed Speech with Korean and English Data by Mobile Phone

The data is recorded by Korean native speakers . The recorded text is a mixture of Korean and English sentences, covering general scenes and human-computer interaction scenes. It is rich in content and accurate in transcription. It can be used for improving the recognition effect of the speech recognition system on Korean-English mixed reading speech.

High-quality datasets are the foundation for the success of artificial intelligence. Therefore, all industries need to continue investing in data infrastructure to make sure the accuracy and diversity of data collection. From smart city to precision medicare, from education equality to environment protection, the future potential of AI will binding with data system to provide dynamic for society and economy.

Challenges of Code-switch Speech Recognition

Recent

Indian Dialect Speech Dataset for AI: Boost Multilingual ASR Accuracy Across Regional Languages

How to Train Embodied AI That Works Everywhere: A Universal Dataset Blueprint

Embodied intelligence 101: IShowSpeed Dances with Advanced Robot in Shenzhen

Previous

Leveraging NLP with AI Data Annotation and Collection

Next

Conversational Speech Data