en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

319,977 Sentences - Mandarin Polyphone Corpus Data

Chinese Polysyllabic Corpus
Chinese polyphone corpus
Chinese corpus

The Mandarin Polyphone Corpus Data is designed for polyphone disambiguation. It includes 603 common Mandarin pinyin pronunciations, There are differences in the number of phonetic corpora according to the number of phrases in a single word.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Data content
corpus for polyphone disambiguation.
Data size
including 603 Mandarin character-pinyin pairs and 319,977 sentences
Data source
including news and colloquial sentences
Annotation
annotating the Mandarin pinyin pronunciation of specific polyphone contained in the sentence
Language
Chinese
Application scenarios
speech synthesis
Accuracy
at a Character Accuracy Rate of 99%
Sample Sample
  • 319,977 Sentences - Mandarin Polyphone Corpus Data
Recommended DatasetsRecommended Dataset
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

106ca539-45a1-4e34-8132-cee9372ed959

aa419fec-1785-4a34-a1d4-c96b33e467e3