AI-Enhanced Marketing Precision

From:Nexdata Date: 08/14/2024

➤ Challenges in Indonesian speech recognition

In the development process of modern artificial intelligence, datasets are the beginning of model training and the key point to improve the performance of algorithm. Whether it is computer vision data for autonomous driving or audio data for emotion analysis, high-quality datasets will provide more accurate capability for prediction. By leveraging these datasets, developers can better optimize the performance of AI systems to cope with complex real-life demands.

Indonesian is one of the most widely spoken languages globally, with over 270 million speakers spread across the archipelago. As technology becomes increasingly integrated into everyday life, it is crucial to enable Indonesian speakers to communicate with and command devices using their native language. However, developing a robust speech recognition system for Indonesian presents unique challenges due to its phonological complexity and rich morphological structure.

Training data is the backbone of any machine learning model, and speech recognition systems are no exception. High-quality training data plays a pivotal role in the accuracy and performance of these systems. In the case of Indonesian speech recognition, having a diverse and extensive dataset of spoken language is essential. This dataset should encompass a wide range of accents, dialects, and speaking styles to ensure the model's ability to adapt to variations in natural speech.

➤ Indonesian speech recognition data

Obtaining sufficient and accurate training data for Indonesian speech recognition is not without challenges. Firstly, the vast linguistic diversity across Indonesia means that the dataset must capture the nuances of various regional accents and linguistic variations. Secondly, privacy concerns and ethical considerations require developers to anonymize and secure the data while complying with data protection regulations.

Indonesian Speech Datasets

359 Hours-Indonesian Speech Data by Mobile Phone

Indonesia speech data (reading) is collected from 496 Indonesian native speakers and is recorded in quiet environment. The recording is rich in content, covering multiple categories such as econimics, entertainment, news, figure, letter, and oral. Around 400 sentences for each speaker. The valid data volumn is 360 hours. All texts are manual transcribed with high accuray.

496 People – Indonesian Speech Data by Mobile Phone_Guiding

Indonesia speech data (guiding) is collected from 496 Indonesian native speakers and is recorded in quiet environment. The recording is rich in content, covering multiple categories such as in-car scene, smart home, speech assistant. 50 sentences for each speaker. The valid volumn is 10.5 hour. All texts are manual transcribed with high accuray.

639 Hours - Indonesian Speech Data by Mobile Phone

1285 Indonesian native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. The data set can be applied for automatic speech recognition, and machine translation scenes.

➤ Indonesian Conversational Speech Data

108 Hours - Indonesian Conversational Speech Data by Mobile Phone

The 108 Hours - Indonesian conversational speech data collected by phone involved 140 native speakers, developed with proper balance of gender ratio, Speakers would choose a few familiar topics out of the given list and start conversations to ensure dialogues' fluency and naturalness. The recording devices are various mobile phones. The audio format is 16kHz, 16bit, uncompressed WAV, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcribed with text content, the start and end time of each effective sentence, and speaker identification.

89 Hours - Indonesian Conversational Speech Data by Telephone

The 89 Hours - Indonesian conversational speech data collected by Telephone involved 124 native speakers, developed with proper balance of gender ratio, Speakers would choose a few familiar topics out of the given list and start conversations to ensure dialogues' fluency and naturalness. The recording devices are various mobile phones. The audio format is 8kHz, 8bit, u-law pcm, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcribed with text content, the start and end time of each effective sentence, and speaker identification.

In the future, as all kinds of data are collected and annotated, how will AI technology change our lives gradually? The future of AI data is full of potential, let’s explore its infinity together. If you have data requirements, please contact Nexdata.ai at [email protected].

AI-Enhanced Marketing Precision

Recent

Meet Nexdata at ICML 2026

Case Study: Nexdata UMI Data Collection

Case Study: Ego-Centric Data Project for Physical AI Model Development

Previous

AI: Fortifying Cybersecurity & Data Protection

Next

Revolutionizing Human-Machine Interaction with Wake-up Words