en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

1,027 People - Mandarin Chinese and English Wake-up Words Scripted Monologue Microphone speech dataset

wake-up words
wake-up

Mandarin Chinese and English Wake-up Words Scripted Monologue Microphone speech dataset, collected from monologue based on given wake-up words, covering 3 speech rates: low, normal, and fast. Transcribed with text content, accent, birthplace, gender and other attributes. Our dataset was collected from extensive and diversify speakers(1027 Chinese) in professional recording studio, geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Format
48kHz, 16bit, uncompressed wav, mono channel
Content category
Wake-up words
Recording condition
Professional recording studio
Recording device
Microphone
Speaker
1,027 Chinese in total, 48% male and 52% female
Country
China(CHN)
Language
Mandarin Chinese,English
Features of annotation
Transcription text, accent, birthplace, gender
Sample Sample
  • Audio

    小艺小艺

  • Audio

    小艺小艺

  • Audio

    小艺小艺

  • Audio

    小艺小艺

  • Audio

    小艺小艺

Recommended DatasetsRecommended Dataset
347 Hours - Indonesian(Indonesia) Spontaneous Dialogue Smartphone speech dataset

Indonesian(Indonesia) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(412 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

audio data dataset conversational asr data Indonesian
306 Hours - Filipino(the Philippines) Spontaneous Dialogue Smartphone speech dataset

Filipino(the Philippines) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(140 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

audio data dataset conversational asr data Filipino
488 Hours - Spanish(Spain) Spontaneous Dialogue Telephony speech dataset

Spanish(Spain) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(600 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Conversational telephone Spanish
547 Hours - French(France) Spontaneous Dialogue Telephony speech dataset

French(France) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(964 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Conversational telephone French
499 Hours - Italian(Italy) Spontaneous Dialogue Telephony speech dataset

Italian(Italy) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(676 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Italian Conversational telephone
444,202 Korean Pronunciation Dictionary

The data contains 444,202 entries. All words and pronunciations are produced by Korean linguists. It can be used in the research and development of Korean ASR technology.

Korean pronunciation dictionary
1,077 Hours - Thai(Thailand) Spontaneous Dialogue Telephony speech dataset

Thai(Thailand) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(1,986 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

thai Conversational telephone
127 Hours - Portuguese(Brazil) Spontaneous Dialogue Smartphone speech dataset

Portuguese(Brazil) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(142 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Conversational speech Portuguese asr data russian asr dataset Brazilian Portuguese
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

bebd586e-f4c9-459e-9ca6-a9ad8927f2b3

f1e74039-7e67-462d-8417-bb8386d127c0