en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

799 Hours - Sichuan Dialect(China) Spontaneous Dialogue Smartphone speech dataset

Sichuan speech data
Sichuan Natural Conversational Speech Data
Sichuan dialects conversional speech data
Sichuan dialects conversional speech dataset
Sichuan dialects conversional audio data

Sichuan Dialect(China) Spontaneous Dialogue Smartphone speech dataset, transcribed with text content, timestamp, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Format
16kHz, 16bit, uncompressed wav, mono channel
Recording environments
quiet indoor environment, without echo
Recording content
no topic is specified, and the speakers make dialogue while the recording is performed
speaker
1,730 people, 74% of which are female; 88% of 1,730 people are not more than 25 years old; people are from Sichuan or Chongqing
Features of annotation
annotating for the transcription text, speaker identification and gender
Recording device
Android Smartphone, iPhone
Country
China(CHN)
Language
Sichuan dialect
Accuracy Rate
Sentence Accuracy Rate(SAR) 95%
Sample Sample
  • Audio

    你有没有喜欢你有没有最近在追的啥子剧之类的嘛

  • Audio

    我就跟她讲有动漫嘛她说动漫都是改了嘞她就不看噻我就去看了动漫我感觉挺好看的

  • Audio

    就看到了魔道祖师嘞个

  • Audio

    我之前我是看过动漫所以我才来在看他的真人真人剧情版

  • Audio

    我是看我当时也是我同学嘛嘎她很喜欢看那些小说噻嘎

Recommended DatasetsRecommended Dataset
97 Hours - German(Germany) Children Real-world Casual Conversation and Monologue speech dataset

German(Germany) Children Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live, lecture, variety show and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spontaneous Speech text annotation German
548 Hours - Taiwanese Accent Mandarin(China) Real-world Casual Conversation and Monologue speech dataset

Taiwanese Accent Mandarin(China) Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Chinese spoken video voice data Chinese voice data Chinese spoken video data Chinese multimodal data
127 Hours - Malay(Malaysia) Spontaneous Dialogue Smartphone speech dataset

Malay(Malaysia) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(142 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

audio data dataset conversational asr data Malay
143 Hours - Uyghur(China) Spontaneous Dialogue Telephony speech dataset

Uyghur(China) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(320 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

audio data dataset conversational asr data Uyghur Uyghur discuss data Uyghur discuss dataset Uyghur discuss collection Uyghur small talk data Uyghur small talk dataset Uyghur small talk collection Uyghur conversational data Uyghur conversational dataset Uyghur conversational collection Uyghur chat data Uyghur chat dataset Uyghur chat collection Uyghur communication data Uyghur communication dataset Uyghur communication collection Uyghur speech data Uyghur speech dataset Uyghur speech collection Uyghur talk data Uyghur talk dataset Uyghur talk collection Uyghur conversation data Uyghur conversation dataset Uyghur conversation collection Uighurs discuss data Uighurs discuss dataset Uighurs discuss collection Uighurs small talk data Uighurs small talk dataset Uighurs small talk collection Uighurs conversational data Uighurs conversational dataset Uighurs conversational collection Uighurs chat data Uighurs chat dataset Uighurs chat collection Uighurs communication data Uighurs communication dataset Uighurs communication collection Uighurs speech data Uighurs speech dataset Uighurs speech collection Uighurs talk data Uighurs talk dataset Uighurs talk collection Uighurs conversation data Uighurs conversation dataset Uighurs conversation collection Uygurs discuss data Uygurs discuss dataset Uygurs discuss collection Uygurs small talk data Uygurs small talk dataset Uygurs small talk collection Uygurs conversational data Uygurs conversational dataset Uygurs conversational collection Uygurs chat data Uygurs chat dataset Uygurs chat collection Uygurs communication data Uygurs communication dataset Uygurs communication collection Uygurs speech data Uygurs speech dataset Uygurs speech collection Uygurs talk data Uygurs talk dataset Uygurs talk collection Uygurs conversation data Uygurs conversation dataset Uygurs conversation collection Uigurs discuss data Uigurs discuss dataset Uigurs discuss collection Uigurs small talk data Uigurs small talk dataset Uigurs small talk collection Uigurs conversational data Uigurs conversational dataset Uigurs conversational collection Uigurs chat data Uigurs chat dataset Uigurs chat collection Uigurs communication data Uigurs communication dataset Uigurs communication collection Uigurs speech data Uigurs speech dataset Uigurs speech collection Uigurs talk data Uigurs talk dataset Uigurs talk collection Uigurs conversation data Uigurs conversation dataset Uigurs conversation collection
212 Hours - Burmese(Myanmar) Real-world Casual Conversation and Monologue speech dataset

Burmese(Myanmar) Real-world Casual Conversation and Monologue speech dataset, covers service, conversation, interview domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Burmese Spontaneous Speech
503 Hours - Russian(Russia) Real-world Casual Conversation and Monologue speech dataset

Russian(Russia) Real-world Casual Conversation and Monologue speech dataset, covers education, interview, sports domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

russia Spontaneous Speech Russian
396 Hours - Korean(Korea) Real-world Casual Conversation and Monologue speech dataset

Korean(Korea) Real-world Casual Conversation and Monologue speech dataset, covers live, variety-show, speech domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spontaneous Speech korean
494 Hours - Hindi(India) Real-world Casual Conversation and Monologue speech dataset

Hindi(India) Real-world Casual Conversation and Monologue speech dataset, covers education, interview, sports domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spontaneous Speech Hindi
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

ecfac6f5-c5f9-4ecc-a5a1-2cac20e4d476

e456ea75-b92d-42f6-87ed-48752b9b9a58