en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

Mandarin Chinese Speech Synthesis Dataset – 370 Speakers, 200 Hours

Chinese paralanguage dataset
spontaneous dialogue dataset
Chinese conversational speech corpus
Mandarin speech synthesis corpus
Chinese speech synthesis dataset

This dataset is recorded by 370 Chinese native speakers and 200 hours of natural conversation audio. Professional phonetician annotationed 14 kinds of paralanguages, full transcriptions, and speaker metadata. Precisely matches with the research and development needs of speech synthesis, dialogue TTS, and natural language modeling research.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Format
48,000Hz, 24bit, uncompressed wav, mono channel;
Recording environment
Recording studio
Recording content
Provide a list of 36 topics, speakers choose one and start a spontaneous dialogue;
Speaker
370 people in total,18~60 years old
Annotation
14 kinds of paralanguage annotation; text transcription; speaker ID; special symbol
Device
Microphone;
Language
Mandarin Chinese;
Sample Sample
  • Audio

    <V>她<S/>特</S>别喜欢<F/>就是</F>小蛋糕,我们有时候也叫她蛋糕妹,<V>因为她<S/>每</S>一天都要吃。

  • Audio

    <V>大窑这个饮料<M/>啊</M>还是比较好喝的,推荐去尝试一下。

  • Audio

    <V>再加上她的男朋友,<V><F/>然后</F>每次他们一吵架,她男朋友就给她买那个小蛋糕去哄她。

Recommended DatasetsRecommended Dataset
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

8bdf021f-140f-45ee-bfef-b507f2b619a6

69eb3972-caf5-40e6-8b50-bb1b7a0011d8