en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

Challenges of Code-switch Speech Recognition

From:Nexdata Date:2024-04-02

Speech recognition technology enables computers to understand human speech, thus supporting a variety of voice interaction scenarios, such as mobile phone applications, human-vehicle collaboration, robot dialogue, voice transcription, etc.

However, in these scenarios, the input for speech recognition is not always a single language, and sometimes there is a mixture of multiple languages. For example, in Chinese scenes, we often use some English terminology to express meaning, which brings new challenges to speech recognition technology.

Code-switch Speech Recognition Challenges

1. Similar pronunciation

Chinese and English speech recognition requires a single model to learn multiple speech sounds, and pronunciations that are similar but have different meanings usually lead to increased model complexity and computation. Since it needs to distinguish and process similar pronunciations in different languages, it is necessary to distinguish different modeling units according to different languages when modeling the model.

2. Scarce Training Data

Chinese-English mixed data is less than single-language data. At present, open source Chinese speech recognition data sets such as WenetSpeech and English speech recognition data set Giga Speech have reached the 10,000-hour level, but the mixed open source Chinese and English speech recognition data are only SEAME and TAL_CSASR two open source data.

Nexdata Code-switch Speech Recognition Data Solutions

1,535 Hours - Mixed Speech with Chinese and English Data by Mobile Phone

The 1,535 Hours - Mixed Speech with Chinese and English Data by Mobile Phone is recorded by 3972 Chinese native speakers with accents covering seven major dialect areas. The recorded text is a mixture of Chinese and English sentences, covering general scenes and human-computer interaction scenes. It is rich in content and accurate in transcription. It can be used for improving the recognition effect of the speech recognition system on Chinese-English mixed reading speech.

303 Hours - Mixed Speech with Chinese and English Data by Mobile Phone

The data is recorded by 1113 Chinese native speakers with accents covering seven major dialect areas. The recorded text is a mixture of Chinese and English sentences, covering general scenes and human-computer interaction scenes. It is rich in content and accurate in transcription. It can be used for improving the recognition effect of the speech recognition system on Chinese-English mixed reading speech.

300 Hours - Mixed Speech with Korean and English Data by Mobile Phone

The data is recorded by Korean native speakers . The recorded text is a mixture of Korean and English sentences, covering general scenes and human-computer interaction scenes. It is rich in content and accurate in transcription. It can be used for improving the recognition effect of the speech recognition system on Korean-English mixed reading speech.

7f35f2e3-61e2-4a84-9fca-59082f50daf4