en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

Home > All Category Datasets > Speech Synthesis Datasets > 100 People - Chinese Multi-emotional Modal particle and Natural Conversation Speech Synthesis Corpus

100 People - Chinese Multi-emotional Modal particle and Natural Conversation Speech Synthesis Corpus

Chinese

Multi-emotional

Modal particle

Natural Conversation

Speech Synthesis

TTS

Chinese Multi-emotional Modal particle and Natural Conversation Speech Synthesis Corpus, is recorded by multiple native Chinese voice actors. It not only includes sentences rich in modal particles that align with daily expression habits, but also encompasses free conversation data on given topics. In each conversation, the audio of each speaker is independently stored in their respective tracks. Professional phoneticians have annotated information such as text content, emotion labeling, paralinguistic labeling, speech rate labeling and timbre labeling, meeting the precise requirements for speech synthesis research and development to a full extent.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Specifications

Specifications

Format

Modal particle: 48kHz, 24bit, wav, mono; Natural Conversation: 48kHz, 24bit, wav, stereo(each speaker's speech occupying his/her own sound track) or mono channel

Recording condition

Recording studio

Recording content

1. Read texts containing modal particles in a natural way; 2. Have a natural conversation based on given topic

Features of annotation

Transcription text, emotion labeling, paralinguistic labeling, speech rate labeling, timbre labeling

Device

Microphone

Speaker

100 professional voice actors，50 males and 50 females

Language

Chinese

Application scenarios

Speech synthesis

Sample

Sample

Audio
刚刚看到一只超可爱的小狗<M/>啊</M>！ Medium_Speed Neutral
Audio
今天天气真好<M/>啊</M>！ Medium_Speed Happy
Audio
对，<M/>呃</M>那你先说一下咱们一起去的第一个城市武汉<M/>吧</M>？ Medium_Speed Neutral
Audio
<M/>嗯</M>，那你说说<M/>吧</M>，你说说你对武汉还有什么印象？ Medium_Speed Neutral
Audio
<M/>嗯</M>，我觉得他特别热心，他还给咱们建议说，<A/><D/><M/>嗯</M></D>要避雷这个什么，那个怎么样，推荐去哪个，然后去推荐吃什么好吃的，然后他们就特别贴心的说这些</A>，<M/>哎</M>非常的感激，现在想一下觉得好温暖。 Medium_Speed Neutral

Recommended Datasets

Recommended Dataset

2 Speakers – Korean TTS Dataset with Native Accent

This dataset contains recordings from 2 native Korean speakers with authentic accent. Contains news and colloquial general corpus, the phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development in text-to-speech, Korean speech synthesis, and AI voice applications.

Korean speech dataset Korean TTS dataset Korean speech synthesis corpus Korean voice dataset for AI Korean accent speech corpus Korean text-to-speech dataset Korean speech recordings for TTS

14 Hours Taiwan Mandarin TTS Dataset – Multi-Style Voices

This dataset contains 14 hours of Taiwan Mandarin recordings from 4 professional voice actors with 7 speaking styles. The styles are criminal subordinate, rough man, little girl, kind grandma, businessman, grandfather and non-commissioned officer. Professional phonetician participates in the annotation. It is ideal for text-to-speech (TTS), expressive voice generation, virtual avatars, and AI speech synthesis applications.

Taiwan Mandarin speech dataset Taiwan Mandarin voice dataset Taiwan Mandarin speech corpus for AI Mandarin accent dataset Taiwan Mandarin TTS dataset

20 Hours Japanese TTS Dataset – Native Japanese Voice Corpus

This dataset contains recordings from 2 native Japanese speakers with authentic accents, each person contribute 10 hours of audio. Contains news and colloquial style general corpus, the phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of building Japanese text-to-speech systems, speech synthesis research, and AI voice applications.

Japanese speech dataset Japanese TTS dataset Japanese speech synthesis corpus Japanese voice dataset for AI native Japanese speech dataset Japanese text-to-speech dataset balanced phoneme Japanese corpus

6 Speakers – Taiwanese Mandarin Speech Dataset for TTS

This dataset includes recordings from 6 professional voice actors from Taiwan, covering news and colloquial speech. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

Taiwanese Mandarin speech dataset Taiwan Mandarin TTS dataset Mandarin speech synthesis corpus native Taiwanese Mandarin corpus

8 Hours - Canadian French TTS Dataset (Native Accent)

This dataset contains recordings from 2 native Canadian French speakers with authentic accents. It is ideal for researchers and developers seeking natural Canadian French voices.

Canadian French TTS dataset Canadian French speech dataset for AI Canadian French accent speech corpus Canadian French text to speech voices Canadian French speech dataset

2 Speakers – Australian English TTS Dataset (Native Accent)

This dataset features recordings from 2 native Australian English speakers with authentic accents. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

Australian English TTS dataset Australian speech dataset for AI Australian accent speech dataset Australian text to speech voices multi-speaker Australian English dataset Australian English phoneme balanced dataset

Dutch TTS Voice Dataset (2 Speakers) for Speech Synthesis

This Dutch speech dataset includes 10 hours of recordings from two native speakers with authentic accent. The dataset features authentic Dutch accents and balanced phoneme coverage. All recordings are annotated with the involvement of professional phoneticians. It is suitable for TTS model training, voice cloning, and fine-tuning tasks.

dutch TTS dataset dutch speech dataset dutch voice dataset dutch speech synthesis dataset netherlands dutch voice dataset dutch dataset for text to speech european language speech dataset

12 Hours – Italian TTS Dataset with Native Accent

This dataset includes recordings from 3 native Italian speakers with authentic accents. Covering both customer service and general speaking styles. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

Italian speech dataset for TTS Italian text to speech dataset Italian voice dataset for AI Italian accent speech dataset multi-speaker Italian TTS dataset Italian TTS dataset

Tell Us Your Special Needs

Current Project Maturity

Early exploration (no concrete specs yet)

Defined goals, need professional guidance

Active development or optimization phase

Data & labeling experts with clear specifications

Full Name *

Contact Phone No.*

Company name *

Company Email *

Data Requirements *

By submitting, I agree to the Privacy Protection

Subscribe to our newsletter

Be the first to receive Nexdata latest product releases, data solutions and enterprise news.

Off-the-Shelf Datasets: All Category Datasets; Embodied AI Datasets; LLM Datasets; Computer Vision Datasets; Speech Recognition Datasets; Speech Synthesis Datasets; OCR Datasets; Pronunciation Dictionary; NLU Datasets

Data Service: 3D Point Cloud Data; Street View Data; OCR Data; Behavior Recognition Data; Identity Recognition Data; Speech Recognition Data; Speech Synthesis Data; Multimodal Data

Industries: Embodied AI; Generative AI; Autonomous Vehicles; AR/VR; Conversational AI; Smart Home; Retail; Intelligent Healthcare

Company: About Us; News; Partners; Quality & Security; Event
Links: OPENMPD; DataPlus; Datarade

Platform: Platform
Competition: Competition
Resources: Sponsored Datasets

Sharpen Your AI with Better Data

+1(626)594-5598

[email protected]

nexdata_ai facebook

nexdata_ai twitter

nexdata_ai linkedin

nexdata_ai youtube

Copyright © 2023 NEXDATA TECHNOLOGY INC

Sitemap Terms and Conditions

We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

b0849529-d971-44f6-bd09-21653592e4a1

2338a707-3bba-4c91-823f-1df95b5cf1b4