en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

Speech Synthesis Datasets

Instantly enhance AI model performance with high quality off-the-shelf datasets.

Voice Type

All
20
Average Tone
11
Emotion
1
Female
5
Front-end Text
3
Male
1
Others
2

Language

All
20
Chinese Dialects
3
Chinese‐English Code‐mixing
1
English
7
Japanese
2
Mandarin
11
Others
3

Chinese Multi-emotional Modal particle and Natural Conversation Speech Synthesis Corpus

Chinese Multi-emotional Modal particle and Natural Conversation Speech Synthesis Corpus, is recorded by multiple native Chinese voice actors. It not only includes sentences rich in modal particles that align with daily expression habits, but also encompasses free conversation data on given topics. In each conversation, the audio of each speaker is independently stored in their respective tracks. Professional phoneticians have annotated information such as text content, meeting the precise requirements for speech synthesis research and development to a full extent.
Chinese Multi-emotional Modal particle Natural Conversation Speech Synthesis TTS

Mandarin Chinese Seperated Track Spontaneous Dialogue Paralanguage Annotated Speech Synthesis Corpus

Mandarin Chinese Seperated Track Spontaneous Dialogue Paralanguage Annotated Speech Synthesis Corpus, with a free dialogue style. Given a topic, the speaker can express themselves, and in each conversation, each person's audio is stored in their own separate WAV file. Professional linguists have annotated 16 types of paralanguage annotations, text annotations, timestamps, and other information to accurately match the research and development needs of speech synthesis.
M Chinese Spontaneous Dialogue Seperated track Conversation 48khz

2 People - Mexican Spanish Average Tone Speech Synthesis Corpus

2 People - Mexican Spanish Average Tone Speech Synthesis Corpus. It is recorded by native Mexican, with authentic accent, Covering both customer service and general styles. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
Mexican Spanish Tts Average Tone

10 Hours - Chaozhou Dialect Speech Synthesis Corpus - Female

10 Hours - Chaozhou Dialect Speech Synthesis Corpus - Female. It is recorded by Chaozhou-Shantou Pronunciation. the phonemes and tones are balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
Synthesis Corpus Chaozhou TTS Chinese Dialect

200,475 Sentences - Chinese Text Normalization Data

200,475 Sentences - Chinese Text Normalization Data. Annotate the special symbols and Arabic numerals in the sentences as Chinese characters.
TN TTS Text Normalization

2 People - Australian English Average Tone Speech Synthesis Corpus

2 People - Australian English Average Tone Speech Synthesis Corpus. It is recorded by native Australian, with authentic accent. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
English Tts Australian English Average Tone

2 People - Chinese Natural Conversation Speech Synthesis Corpus

2 People - Chinese Natural Conversation Speech Synthesis Corpus. It is recorded by Chinese native speaker, natural conversation style. phonemes and tones are balanced. Professional phonetician participates in the annotation, and annotate secondary language, Secondary Language Annotation: Inhalation: V; Pause: P; Hesitation: T; Mouth clicking: M; Drawl: D; Cough: C; Laughter: L; Stutter repetition: R; Inversion: I; Modal particle: S (Modal particles include "ah", "oh", "wow", "right?", "what?", "well" etc.). It precisely matches with the research and development needs of the speech synthesis.
Natural conservation Secondary language TTS

12 Hours - Chinese Mandarin Synthesis Corpus-Female, Entertainment anchor Style, Multi-emotional

12 Hours - Chinese Mandarin Entertainment anchor Style Multi-emotional Synthesis Corpus. It is recorded by Chinese native speaker. six emotional text+modal particles, phonemes and tones are balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
Synthesis Corpus TTS Mandarin Multi-emotional Entertainment anchor

10.4 Hours - Japanese Synthesis Corpus-Female

10.4 Hours - Japanese Synthesis Corpus-Female. It is recorded by Japanese native speaker, with authentic accent. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
Japanese TTS Female

loading

Tailor Your Data Now

Why off-the-shelf Datasets

  • Copyright

    Copyright

    Clear Coyright and Ready to Check
  • Security

    Security

    Properly Authorized Secure to Use
  • Professional

    Professional

    Designed and produced by AI data experts
  • Diversity

    Diversity

    Collected from a varity of real scenes
  • Cost Effective

    Cost Effective

    More Cost-Efficient Than Tailored Data
  • Efficiency

    Efficiency

    Ready-To-Go Deliver in Seconds
78c0a804-6a47-49c6-8676-a37f696a1436