Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

200,955 Sentences - Mandarin Prosodic Corpus Data

Chinese

Prosodic Annotation

Speech Synthesis

Front-end Training Set

4 prosodic hierarchies annotating for the 200000 carefully selected Chinese texts which involve news and colloquial sentences. The sentence length is appropriate with diversified sentence patterns. This can be used as a TTS front-end prosody prediction training data set.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Specifications

Data content

prosodic annotation for 200,955 selected Chinese sentences

Data scale

200,955 sentences

Data source

all the text comes from the news and human conversation

Annotation

4 prosodic hierarchies annotating

Language

Chinese

Application scenarios

speech synthesis

Accuracy

not lower than 99%

Sample

Recommended Dataset

200,475 Sentences - Chinese Text Normalization Data

200,475 Sentences - Chinese Text Normalization Data. Annotate the special symbols and Arabic numerals in the sentences as Chinese characters.

TN TTS Text Normalization

319,977 Sentences - Mandarin Polyphone Corpus Data

The Mandarin Polyphone Corpus Data is designed for polyphone disambiguation. It includes 603 common Mandarin pinyin pronunciations, There are differences in the number of phonetic corpora according to the number of phrases in a single word.

Mandarin Polyphone TTS Front-end Training Data Set

Tell Us Your Special Needs

Full Name *

Contact Phone No. *

Company name *

Company Email *

Data Requirements *

By submitting, I agree to the Privacy Protection

Submit

Subscribe to our newsletter

Be the first to receive Nexdata latest product releases, data solutions and enterprise news.

Off-the-Shelf Datasets: All Category Datasets; LLM Datasets; Computer Vision Datasets; Speech Recognition Datasets; Speech Synthesis Datasets; OCR Datasets; Pronunciation Dictionary; NLU Datasets

Data Service: 3D Point Cloud Data; Street View Data; OCR Data; Behavior Recognition Data; Identity Recognition Data; Speech Recognition Data; Speech Synthesis Data; Multimodal Data

Industries: Generative AI; Autonomous Vehicles; AR/VR; Conversational AI; Smart Home; Retail; Intelligent Healthcare

Company: About Us; News; Partners; Quality & Security; Event
Links: OPENMPD; DataPlus; Datarade

Platform: Platform
Competition: Competition
Resources: Sponsored Datasets

Sharpen Your AI with Better Data

+1(626)594-5598

[email protected]

Sitemap Terms and Conditions

We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

2711bdbf-b3e5-4502-b827-6c6ea8778c14

2ee9ae0f-ab10-4cf9-9850-1db558d47ece

200,955 Sentences - Mandarin Prosodic Corpus Data

Chinese Prosodic Annotation Speech Synthesis Front-end Training Set

4 prosodic hierarchies annotating for the 200000 carefully selected Chinese texts which involve news and colloquial sentences. The sentence length is appropriate with diversified sentence patterns. This can be used as a TTS front-end prosody prediction training data set.

Chinese

Prosodic Annotation

Speech Synthesis

Front-end Training Set