From:Nexdata Date: 2024-08-14
Among the diverse languages seeking integration into this technology, Thai holds a significant place. Thai speech recognition has been a focal point of research and development, driven by the growing demand for localized and personalized user experiences.
Over the past few years, Thai speech recognition technology has witnessed remarkable advancements, largely due to the availability of extensive linguistic data. The foundation of any speech recognition system lies in its dataset, and Thai is no exception. The abundance of voice data from various sources, including social media, podcasts, and recorded conversations, has played a pivotal role in training machine learning algorithms. As a result, Thai speech recognition systems have achieved unprecedented accuracy and fluency.
However, this progress is not devoid of challenges. The linguistic complexity of Thai poses hurdles in developing accurate recognition models. The language is tonal and features a unique script, demanding a deep understanding of its phonetics and syntax. Acquiring and annotating precise data for Thai speech recognition remains an ongoing challenge. Moreover, ensuring the inclusivity of regional accents and dialects further complicates the data collection process.
Nexdata Thai Speech Datasets
203 Hours – Thai Speech Data by Mobile Phone_Reading
Thai speech data (reading) is collected from 498 Thailand native speakers and is recorded in quiet environment. The recording is rich in content, covering multiple categories such as econimics, entertainment, news, figure, and oral. Around 400 sentences for each speaker. The valid data volumn is 203 hours. All texts are manual transcribed with high accuray.
1,077 Hours - Thai Conversational Speech Data by Telephone
The 1,077 Hours - Thai Conversational Speech Data involved 1,986 native speakers, developed with proper balance of gender ratio, Speakers would choose a few familiar topics out of the given list and start conversations to ensure dialogues' fluency and naturalness. The recording devices are various mobile phones. The audio format is 8kHz, 8bit, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcribed with text content, the start and end time of each effective sentence, and speaker identification.