en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

Home > All Category Datasets > Speech Recognition Datasets > 303 Hours English-Mandarin Bilingual Speech Dataset – Mobile Phone Recordings

303 Hours English-Mandarin Bilingual Speech Dataset – Mobile Phone Recordings

chinese-english speech dataset

bilingual speech dataset

mixed language speech dataset

chinese-english audio dataset

code-switching speech dataset

This dataset contains 303 hours of Chinese-English mixed speech, collected from monologue based on given Chinese and English Mixed prompts, covering general and human-computer interaction domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,113 speakers), geographicly speaking, enhancing model performance in real and complex tasks like ASR, TTS, code-switching, and bilingual speech-related AI tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Specifications

Specifications

Format

16kHz, 16bit, uncompressed wav, mono channel;

Content category

Generic domain, human-machine interaction;

Recording condition

Low background noise (indoor), without echo;

Recording device

Android smartphone, iPhone;

Speaker

1,113 speakers in total, 45% male and 55% female. 75% speakers of all are in the age group of 14-25, 25% speakers of all in the age group of 26-46;

Country

China(CHN);

Language

Mandarin Chinese, English;

Features of annotation

Transcription text.

Accuracy Rate

Sentence Accuracy Rate (SAR) 97%

Sample

Sample

Audio
我说放一首Stop The Drama.
Audio
Easy Life是上午九点上班吗?
Audio
跟室友的sheet对比照
Audio
我叫你帮我打开Sha DOW Rocket.
Audio
现在我无比后悔没有选合照course

Recommended Datasets

Recommended Dataset

275 Hours - Mixed Speech with Korean and English Data by Mobile Phone

Mixed Speech with Korean and English Scripted Monologue Smartphone speech dataset, collected from monologue based on given prompts, covering oral category; human-machine interaction category; smart home command and in-car command category; numbers; news category. Our dataset was collected from extensive and diversify speakers(737 native speakers), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Korean English Reading Minxed speech

Tell Us Your Special Needs

Current Project Maturity

Early exploration (no concrete specs yet)

Defined goals, need professional guidance

Active development or optimization phase

Data & labeling experts with clear specifications

Full Name *

Contact Phone No.*

Company name *

Company Email *

Data Requirements *

By submitting, I agree to the Privacy Protection

Subscribe to our newsletter

Be the first to receive Nexdata latest product releases, data solutions and enterprise news.

Off-the-Shelf Datasets: All Category Datasets; LLM Datasets; Computer Vision Datasets; Speech Recognition Datasets; Speech Synthesis Datasets; OCR Datasets; Pronunciation Dictionary; NLU Datasets

Data Service: 3D Point Cloud Data; Street View Data; OCR Data; Behavior Recognition Data; Identity Recognition Data; Speech Recognition Data; Speech Synthesis Data; Multimodal Data

Industries: Embodied AI; Generative AI; Autonomous Vehicles; AR/VR; Conversational AI; Smart Home; Retail; Intelligent Healthcare

Company: About Us; News; Partners; Quality & Security; Event
Links: OPENMPD; DataPlus; Datarade

Platform: Platform
Competition: Competition
Resources: Sponsored Datasets

Sharpen Your AI with Better Data

+1(626)594-5598

[email protected]

nexdata_ai facebook

nexdata_ai twitter

nexdata_ai linkedin

nexdata_ai youtube

Copyright © 2023 NEXDATA TECHNOLOGY INC

Sitemap Terms and Conditions

We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

0b3d6432-141d-4698-ac8e-5d18a8a040a4

e62ecce3-82b9-426d-a214-1c478084bb9d