en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

283 Hours - Indonesian(Indonesia) Spontaneous Dialogue Telephony speech dataset

audio
data
dataset
conversational
asr data
Indonesian
telephone

Indonesian(Indonesia) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(376 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Format
8kHz 8bit, a-law/u-law pcm, mono channel
Content category
Dialogue based on given topics
Recording condition
Low background noise (indoor)
Recording device
Telephony
Speaker
376 people in total, 53% male and 47% female
Country
Indonesia(IDN)
Language(Region) Code
id-ID
Language
Indonesian
Features of annotation
Transcription text, timestamp, speaker ID, gender, noise
Accuracy rate
Word accuracy rate(WAR) 98%
Sample Sample
  • Audio

    [N] Kenapa kamu ngefan sama Raffi Ahmad?

  • Audio

    Kalau komputer itu kan ya, nanti udah ada soal untuk kurikulumnya lah.

  • Audio

    Misal pelajaran faktor [S] tentang komputer, apa basic data.

  • Audio

    Eh, aku sekarang ngefannya sama artis Indonesia, Raffi Ahmad.

  • Audio

    Kamu sekarang ngefan sama siapa?

Recommended DatasetsRecommended Dataset
548 Hours - Taiwanese Accent Mandarin(China) Real-world Casual Conversation and Monologue speech dataset

Taiwanese Accent Mandarin(China) Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Chinese spoken video voice data Chinese voice data Chinese spoken video data Chinese multimodal data
127 Hours - Malay(Malaysia) Spontaneous Dialogue Smartphone speech dataset

Malay(Malaysia) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(142 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

audio data dataset conversational asr data Malay
143 Hours - Uyghur(China) Spontaneous Dialogue Telephony speech dataset

Uyghur(China) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(320 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

audio data dataset conversational asr data Uyghur Uyghur discuss data Uyghur discuss dataset Uyghur discuss collection Uyghur small talk data Uyghur small talk dataset Uyghur small talk collection Uyghur conversational data Uyghur conversational dataset Uyghur conversational collection Uyghur chat data Uyghur chat dataset Uyghur chat collection Uyghur communication data Uyghur communication dataset Uyghur communication collection Uyghur speech data Uyghur speech dataset Uyghur speech collection Uyghur talk data Uyghur talk dataset Uyghur talk collection Uyghur conversation data Uyghur conversation dataset Uyghur conversation collection Uighurs discuss data Uighurs discuss dataset Uighurs discuss collection Uighurs small talk data Uighurs small talk dataset Uighurs small talk collection Uighurs conversational data Uighurs conversational dataset Uighurs conversational collection Uighurs chat data Uighurs chat dataset Uighurs chat collection Uighurs communication data Uighurs communication dataset Uighurs communication collection Uighurs speech data Uighurs speech dataset Uighurs speech collection Uighurs talk data Uighurs talk dataset Uighurs talk collection Uighurs conversation data Uighurs conversation dataset Uighurs conversation collection Uygurs discuss data Uygurs discuss dataset Uygurs discuss collection Uygurs small talk data Uygurs small talk dataset Uygurs small talk collection Uygurs conversational data Uygurs conversational dataset Uygurs conversational collection Uygurs chat data Uygurs chat dataset Uygurs chat collection Uygurs communication data Uygurs communication dataset Uygurs communication collection Uygurs speech data Uygurs speech dataset Uygurs speech collection Uygurs talk data Uygurs talk dataset Uygurs talk collection Uygurs conversation data Uygurs conversation dataset Uygurs conversation collection Uigurs discuss data Uigurs discuss dataset Uigurs discuss collection Uigurs small talk data Uigurs small talk dataset Uigurs small talk collection Uigurs conversational data Uigurs conversational dataset Uigurs conversational collection Uigurs chat data Uigurs chat dataset Uigurs chat collection Uigurs communication data Uigurs communication dataset Uigurs communication collection Uigurs speech data Uigurs speech dataset Uigurs speech collection Uigurs talk data Uigurs talk dataset Uigurs talk collection Uigurs conversation data Uigurs conversation dataset Uigurs conversation collection
212 Hours - Burmese(Myanmar) Real-world Casual Conversation and Monologue speech dataset

Burmese(Myanmar) Real-world Casual Conversation and Monologue speech dataset, covers service, conversation, interview domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Burmese Spontaneous Speech
503 Hours - Russian(Russia) Real-world Casual Conversation and Monologue speech dataset

Russian(Russia) Real-world Casual Conversation and Monologue speech dataset, covers education, interview, sports domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

russia Spontaneous Speech Russian
396 Hours - Korean(Korea) Real-world Casual Conversation and Monologue speech dataset

Korean(Korea) Real-world Casual Conversation and Monologue speech dataset, covers live, variety-show, speech domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spontaneous Speech korean
494 Hours - Hindi(India) Real-world Casual Conversation and Monologue speech dataset

Hindi(India) Real-world Casual Conversation and Monologue speech dataset, covers education, interview, sports domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spontaneous Speech Hindi
501 Hours - Indonesian(Indonesia) Real-world Casual Conversation and Monologue speech dataset

Indonesian(Indonesia) Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Indonesian Colloquial Video text annotation
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

d85699be-4ceb-492d-985e-ad29ccacffb1

9dac8ebb-2b97-4c15-b887-02ae0da1ddd9