en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

101 Hours - Italian(Italy) Children Real-world Casual Conversation and Monologue speech dataset

Spontaneous Speech Data
text annotation
Italian

Italian(Italy) Children Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live, lecture, variety show and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Format
16kHz, 16 bit, wav, mono channel
Age
12 years old and younger children
Content category
including interview, self-meida,variety show, etc.
Recording environment
Low background noise
Country
Italy(ITA)
Language(Region) Code
it-IT
Language
Italian
Features of annotation
Transcription text, timestamp, speaker ID, gender, noise
Accuracy
Word Accuracy Rate (WAR) 98%
Sample Sample
  • Audio

    Eh! [N]

  • Audio

    No! No! Ah! [N]

  • Audio

    Con tanti passaggi segreti. [N]

  • Audio

    Vabbè, comincio, Filippo non accettava rimproveri da nessuno. [N]

  • Audio

    Cooper ma parli? [N]

Recommended DatasetsRecommended Dataset
548 Hours - Taiwanese Accent Mandarin(China) Real-world Casual Conversation and Monologue speech dataset

Taiwanese Accent Mandarin(China) Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Chinese spoken video voice data Chinese voice data Chinese spoken video data Chinese multimodal data
127 Hours - Malay(Malaysia) Spontaneous Dialogue Smartphone speech dataset

Malay(Malaysia) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(142 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

audio data dataset conversational asr data Malay
143 Hours - Uyghur(China) Spontaneous Dialogue Telephony speech dataset

Uyghur(China) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(320 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

audio data dataset conversational asr data Uyghur Uyghur discuss data Uyghur discuss dataset Uyghur discuss collection Uyghur small talk data Uyghur small talk dataset Uyghur small talk collection Uyghur conversational data Uyghur conversational dataset Uyghur conversational collection Uyghur chat data Uyghur chat dataset Uyghur chat collection Uyghur communication data Uyghur communication dataset Uyghur communication collection Uyghur speech data Uyghur speech dataset Uyghur speech collection Uyghur talk data Uyghur talk dataset Uyghur talk collection Uyghur conversation data Uyghur conversation dataset Uyghur conversation collection Uighurs discuss data Uighurs discuss dataset Uighurs discuss collection Uighurs small talk data Uighurs small talk dataset Uighurs small talk collection Uighurs conversational data Uighurs conversational dataset Uighurs conversational collection Uighurs chat data Uighurs chat dataset Uighurs chat collection Uighurs communication data Uighurs communication dataset Uighurs communication collection Uighurs speech data Uighurs speech dataset Uighurs speech collection Uighurs talk data Uighurs talk dataset Uighurs talk collection Uighurs conversation data Uighurs conversation dataset Uighurs conversation collection Uygurs discuss data Uygurs discuss dataset Uygurs discuss collection Uygurs small talk data Uygurs small talk dataset Uygurs small talk collection Uygurs conversational data Uygurs conversational dataset Uygurs conversational collection Uygurs chat data Uygurs chat dataset Uygurs chat collection Uygurs communication data Uygurs communication dataset Uygurs communication collection Uygurs speech data Uygurs speech dataset Uygurs speech collection Uygurs talk data Uygurs talk dataset Uygurs talk collection Uygurs conversation data Uygurs conversation dataset Uygurs conversation collection Uigurs discuss data Uigurs discuss dataset Uigurs discuss collection Uigurs small talk data Uigurs small talk dataset Uigurs small talk collection Uigurs conversational data Uigurs conversational dataset Uigurs conversational collection Uigurs chat data Uigurs chat dataset Uigurs chat collection Uigurs communication data Uigurs communication dataset Uigurs communication collection Uigurs speech data Uigurs speech dataset Uigurs speech collection Uigurs talk data Uigurs talk dataset Uigurs talk collection Uigurs conversation data Uigurs conversation dataset Uigurs conversation collection
212 Hours - Burmese(Myanmar) Real-world Casual Conversation and Monologue speech dataset

Burmese(Myanmar) Real-world Casual Conversation and Monologue speech dataset, covers service, conversation, interview domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Burmese Spontaneous Speech
503 Hours - Russian(Russia) Real-world Casual Conversation and Monologue speech dataset

Russian(Russia) Real-world Casual Conversation and Monologue speech dataset, covers education, interview, sports domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

russia Spontaneous Speech Russian
396 Hours - Korean(Korea) Real-world Casual Conversation and Monologue speech dataset

Korean(Korea) Real-world Casual Conversation and Monologue speech dataset, covers live, variety-show, speech domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spontaneous Speech korean
494 Hours - Hindi(India) Real-world Casual Conversation and Monologue speech dataset

Hindi(India) Real-world Casual Conversation and Monologue speech dataset, covers education, interview, sports domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spontaneous Speech Hindi
501 Hours - Indonesian(Indonesia) Real-world Casual Conversation and Monologue speech dataset

Indonesian(Indonesia) Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Indonesian Colloquial Video text annotation
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

ea084f62-1f1e-4c6c-8cd9-62cfae5f04a7

dec6da97-2636-42aa-bf1e-fdde0c703d6c