Challenges of Korea Speech Recognition

From:Nexdata Date: 08/15/2024

➤ Korean speech recognition challenges

In the development process of modern artificial intelligence, datasets are the beginning of model training and the key point to improve the performance of algorithm. Whether it is computer vision data for autonomous driving or audio data for emotion analysis, high-quality datasets will provide more accurate capability for prediction. By leveraging these datasets, developers can better optimize the performance of AI systems to cope with complex real-life demands.

Korean is the official language of North Korea and South Korea, two countries on the Korean peninsula with the Korean nation as the main ethnic group. It is an adhering language. The natural vocabulary of Korean is composed of a large number of morphemes, and the language has rich phonological changes.

Adhesive language is a language category in language morphology, and this category of language needs to rely heavily on the inflection of morphemes to express grammatical relations. Since the main research object of speech recognition technology in the development process is analytic language or low-inflection language, the sticky characteristic poses many challenges to mainstream speech recognition technology.

➤ Korean speech recognition data

Korean Speech Recognition Challenge

The first is mainly the language model aspect. The natural language units of Korean are words and words separated by spaces. The length is not fixed. It may be a substantive word plus a particle, or it may be a separate substantive word, corresponding to several words in English. The characteristics of agglutinative language make Korean words have a large number of inflections, and the common ones can reach millions, which far exceeds the dictionary size of conventional speech recognition systems.

The second challenge is acoustic model modeling, where the cohesive properties lead to severe co-articulation, which greatly increases the confusion of the acoustic model. The degree of confusion of the acoustic model is weakened by introducing the concept of isotopes, but experiments have shown that although this method is more effective on the monophone acoustic model modeling unit, it is not effective on the triphone used in conventional speech recognition systems. The effect on the acoustic model modeling unit is not ideal.

Nexdata‘s Korean Speech Recognition Data Solution

516 Hours - Korean Speech Data by Mobile Phone

➤ Korean speech data by mobile phone

The 516 Hours - Korean Speech Data of natural conversations collected by phone involved more than 1,077 native speakers, ehe duration of each speaker is around half an hour. developed with proper balance of gender ratio and geographical distribution. The recording devices are various mobile phones. The audio format is 16kHz, 16bit, uncompressed WAV, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcript with text content, the start and end time of each effective sentence, and speaker identification. The accuracy rate of sentences is ≥ 95%.

500 Hours - Korean Conversational Speech Data by Mobile Phone

The 500 Hours - Korean Conversational Speech Data by Mobile Phone collected by phone involved more than 700 native speakers, developed with a proper balance of gender ratio. Speakers would choose a few familiar topics out of the given list and start conversations to ensure the dialogue's fluency and naturalness. The recording devices are various mobile phones. The audio format is 16kHz, 16bit, uncompressed WAV, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcribed with text content, the start and end time of each effective sentence, and speaker identification. The accuracy rate of sentences is ≥ 95%.

393 Hours - Korean Children Speech Data by Mobile Phone

Mobile phone captured audio data of Korean children, with total duration of 393 hours. 1085 speakers are children aged 6 to 15; the recorded text contains common children's languages such as essay stories, and numbers. All sentences are manually transferred with high accuracy.

197 Hours - Korean Speech Data by Mobile Phone_Reading

It collects 291 Korean locals and is recorded in quiet indoor environment. The recordings include economics, entertainment, news, oral, figure, letter. 400 sentences for each speaker. Recording devices are mainstream Android phones and iPhones.

211 people - Korean Speech Data by Mobile Phone_Guiding

It collects 211 Korean locals and is recorded in quiet indoor environment. 99 females, 112 males. Recording devices are mainstream Android phones and iPhones.

Data quality play a vital role in the development of artificial intelligence. In the future, with the continuous development of AI technology, the collection, cleaning, and annotation of datasets will become more complex and crucial. By continuously improve data quality and enrich data resources, AI systems will accurately satisfy all kinds of needs.

Challenges of Korea Speech Recognition

Recent

Case Study: Indonesian Language Data Collection Project

Case Study: British Native Lip-Reading Multimodal Project

Case Study: Embodied AI Data Collection Project

Previous

Conversational Speech Data

Next

American English Speech Recognition Data