100 Speakers Chinese Speech Synthesis Dataset & Multi-Emotion

Chinese emotional speech data

Chinese conversational speech corpus

Chinese natural conversation dataset

Chinese prosody dataset

This dataset is recorded by 100 professional Chinese voice actors. It not only includes sentences rich in modal particles that align with daily expression habits, but also encompasses free conversation data on given topics. Each speaker’s audio is stored in a separate track. All recordings are annotated by professional phoneticians with text, timestamps, and prosody details, meeting the precise requirements for speech synthesis, emotion recognition, and prosody modeling research.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Recommended Dataset

4 People - Chinese High-expressivity Narration Average Tone Speech Synthesis Corpus

4 People - Chinese High-expressivity Narration Average Tone Speech Synthesis Corpus, it is recorded by professional Character Voices, Given the book, the speaker reads in a highly expressive narration style.

High-expressivity Narration TTS Chinese

5 People - Multi-style And Multi-emotional Average Tone Speech Synthesis Corpus

5 People - Multi-style And Multi-emotional Average Tone Speech Synthesis Corpus, it is recorded by professional Character Voices. Styles include the capable female boss, the straightforward prince, the nimble maid, and the kind elderly lady-four in total; emotions include disdain, anger, happiness, concern, surprise, gasp of fear, cold snort (disdain), sympathy, laughter, inner thoughts, seriousness, disgust, puzzlement, sadness and neutrality.

Synthesis Corpus TTS Mandarin Chinese Multi-style Multi-emotional

5 Hours - Wuhan Dialect Speech Synthesis Corpus - Male

5 Hours - Wuhan Dialect Speech Synthesis Corpus - Male，recorded by native of Wuhan. The content of the recording contains speaker's self-expression, multiple topics for the specified text, interjections, mixed Chinese and English, and pure English word. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

Dialect TTS

5 Hours - Changsha Dialect Speech Synthesis Corpus - Female

5 Hours - Changsha Dialect Speech Synthesis Corpus - Male，recorded by native of Changsha. The content of the recording contains speaker's self-expression, multiple topics for the specified text, interjections, mixed Chinese and English, and pure English word. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

Dialect TTS Changsha

2 People - Cantonese Multi-emotional Natural Conversation Speech Synthesis Corpus

2 People - Cantonese Multi-emotional Natural Conversation Speech Synthesis Corpus, It is recorded by native speaker from Guangdong, natural conversation style. Given a topic, the speaker expresses freely. The emotions include normality, happiness, anger, fear, disgust, sadness, etc. Professional phonetician participates in the annotation, and annotate emotions and secondary language,. It precisely matches with the research and development needs of the highly natural and emotionally rich speech synthesis.

Multi-emotional Natural Conversation Secondary language TTS Cantonese

Mandarin Chinese Multi-Stream Speech Dataset – 294 Speakers, 203 Hours

This Mandarin Chinese speech synthesis dataset features with 294 speakers total 203 hours of audio, gender balanced 144 females and 150 males, ages from 18 to 60 years old. Each speaker records free-form dialogues based on given topics, and in each conversation, each person's audio is stored in their own separate WAV file. Professional linguists have annotated 16 types of paralanguage annotations, including text annotations and timestamps, and other information to accurately match the research and development needs of speech synthesis and paralanguage research.

paralanguage speech dataset Mandarin speech synthesis corpus Chinese speech synthesis dataset spontaneous dialogue speech synthesis annotated speech synthesis dataset dialogue speech synthesis dataset multi-stream speech synthesis dataset Chinese paralanguage dataset spontaneous dialogue dataset multi-stream speech corpus

2 Speakers – Korean TTS Dataset with Native Accent

This dataset contains recordings from 2 native Korean speakers with authentic accent. Contains news and colloquial general corpus, the phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development in text-to-speech, Korean speech synthesis, and AI voice applications.

Korean speech dataset Korean TTS dataset Korean speech synthesis corpus Korean voice dataset for AI Korean accent speech corpus Korean text-to-speech dataset Korean speech recordings for TTS

14 Hours Taiwan Mandarin TTS Dataset – Multi-Style Voices

This dataset contains 14 hours of Taiwan Mandarin recordings from 4 professional voice actors with 7 speaking styles. The styles are criminal subordinate, rough man, little girl, kind grandma, businessman, grandfather and non-commissioned officer. Professional phonetician participates in the annotation. It is ideal for text-to-speech (TTS), expressive voice generation, virtual avatars, and AI speech synthesis applications.

Taiwan Mandarin speech dataset Taiwan Mandarin voice dataset Taiwan Mandarin speech corpus for AI Mandarin accent dataset Taiwan Mandarin TTS dataset