en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

Conversational Speech Data

From:Nexdata Date:2024-04-02

With the implementation of speech recognition technology in more natural scenarios such as smart customer service and smart meetings, the training effect of reading aloud speech data has become unsatisfactory.

Because the speaker's pronunciation habits are more natural in daily life, there will be a lot of legato, swallowing, pronunciation deformation, and unclear articulation when speaking. The speaker often does not deliberately control the voice and pronunciation habits, and multiple people communicate at the same time. Sometimes there may even be complex speech phenomena such as sentence interruption, word rush, overlapping sounds, etc., so the speech recognition rate of this natural dialogue style is not very ideal.

Data is the foundation of artificial intelligence. To make artificial intelligence technology have a higher accuracy rate, a training data set that better matches the application scenario is needed. Natural dialogue speech data has become a more urgent data set in the industry.

Nexdata has nearly 40,000 hours of natural dialogue voice data, including Mandarin Chinese, dialects, English, Japanese, Korean, Hindi, Vietnamese, Arabic, Spanish, French, German, Italian, etc. The speakers come from different regions And cities, age and gender coverage balance. All audio has undergone strict manual transcription and quality inspection, marking the text content, the start and end time points of valid sentences, the identity of the recorder, etc., and the sentence accuracy rate is as high as 95%.

1,136 Hours – American English Conversational Speech Data by Mobile Phone

The 1,136-hour American English speech data of natural conversations collected by phone involved more than 1,000 native English speakers in America, developed with proper balance of gender ratio and geographical distribution. Speakers would choose a few familiar topics out of the given list and start conversations to ensure dialogues' fluency and naturalness. The recording devices are various mobile phones. The audio format is 16kHz, 16bit, uncompressed WAV, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcript with text content, the start and end time of each effective sentence, and speaker identification. The accuracy rate of sentences is 95%. 

607 Hours - Cantonese Conversational Speech Data by Mobile Phone and Voice Recorder

The 607-hour Cantonese Conversational Speech Data involved 995 native speakers. Speakers would choose a few familiar topics out of the given list and start conversations to ensure dialogues' fluency and naturalness. The recording devices are various mobile phones and professional audio recorders. The audio format is 16kHz, 16bit, uncompressed WAV, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcribed with text content. The start and end time of each effective sentence, and speaker identification and other more attributes are also annotated. The accuracy rate of sentences is 95%.

500 Hours - Korean Conversational Speech Data by Mobile Phone

The 500 Hours - Korean Conversational Speech Data by Mobile Phone collected by phone involved more than 700 native speakers, developed with a proper balance of gender ratio. Speakers would choose a few familiar topics out of the given list and start conversations to ensure the dialogue's fluency and naturalness. The recording devices are various mobile phones. The audio format is 16kHz, 16bit, uncompressed WAV, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcribed with text content, the start and end time of each effective sentence, and speaker identification. The accuracy rate of sentences is 95%.

500 Hours - Italian Conversational Speech Data by Mobile Phone

The 500 Hours - Italian Conversational Speech Data involved more than 700 native speakers, developed with proper balance of gender ratio, Speakers would choose a few familiar topics out of the given list and start conversations to ensure dialogues' fluency and naturalness. The recording devices are various mobile phones. The audio format is 16kHz, 16bit, uncompressed WAV, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcribed with text content, the start and end time of each effective sentence, and speaker identification. The accuracy rate of word is 98%.

100 Hours - Russian Conversational Speech Data by Mobile Phone

The 100 Hours - Russian Conversational Speech Data involved more than 130 native speakers, developed with proper balance of gender ratio, Speakers would choose a few familiar topics out of the given list and start conversations to ensure dialogues' fluency and naturalness. The recording devices are various mobile phones. The audio format is 16kHz, 16bit, uncompressed WAV, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcribed with text content, the start and end time of each effective sentence, and speaker identification.

35a6b051-7cd9-4fc2-83ef-84861c9d0d32