en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

Upgrade Your Speech Recognition Models with Large Scale Data

From:Nexdata Date:2024-04-02

The data shows that the global smart voice market will grow from US$11.03 billion in 2017 to US$26.39 billion in 2021, and the global smart voice industry will reach US$35.12 billion in 2022, maintaining a high growth rate of 33.1%. It is expected to reach US$39.92 billion in 2023.

In the past ten years, speech recognition technology has made great progress. Continuous speech and non-specific real-time speech recognition systems have been successfully developed and developed in the laboratory. A large number of speech recognition technologies have entered the stage of implementation . However, the actual application scenarios of speech recognition face various challenges. In summary, the challenges mainly include three aspects: robustness, low resources, and complex scenarios.

Typical problems of robustness include accents and dialects, mixed or multilingual languages, domain adaptation, etc. Low resources refer to scenarios where system deployment resources are limited and annotation data is scarce. The former is typically the deployment of various end-side devices in AIoT scenarios The limitation of model size and computing power, and the lack of training data are also key factors that limit the development of speech recognition in various vertical domains and languages.

In order to solve the problem of lack of speech recognition data, Nexdata has designed and developed 200,000 hours of speech recognition datasets, including more than 60 languages and dialects, such as Chinese Mandarin, English, Japanese, Korean, Hindi, Vietnamese, Arabic, Spanish, French, German, Italian, and Portuguese .

344 People - American English Speech Data by Mobile Phone_Guiding

The data set contains 344 American English speakers' speech data, all of whom are American locals. 50 sentences for each speaker. The valid data is 9.7 hours. It is recorded in quiet environment. The contents cover in-car scenario, smart home and speech assistant.

520 Hours - French Speaking English Speech Data by Mobile Phone

1089 French native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones. The data set can be applied for automatic speech recognition, and machine translation scenes.

211 Hours - German Speech Data by Mobile Phone_Reading

The data set contains 327 German native speakers' speech data. The recording contents include economics, entertainment, news, oral, figure, letter, etc. Each sentence contains 10.3 words on average. Each sentence is repeated 1.4 times on average. All texts are manually transcribed to ensure the high accuracy.

347 Hours-Italian Speech Data Collected by Mobile Phone

Italian audio data captured by mobile phone , with total duration of 347 hours. It is recorded by 800 Italian native speakers, balanced in gender is balanced; the recording environment is quiet; all texts are manually transferred with high accuracy. This data set can be applied on automatic speech recognition, machine translation, and sound pattern recognition.

1,044 Hours - Brazilian Portuguese Speech Data by Mobile Phone

The 1,044 Hours - Brazilian Portuguese Speech Data of natural conversations collected by phone involved more than 2,038 native speakers, developed with proper balance of gender ratio and geographical distribution. Speakers would choose linguistic experts designed topics conduct conversations. The recording devices are various mobile phones. The audio format is 16kHz, 16bit, uncompressed WAV, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcript with text content, the start and end time of each effective sentence, and speaker identification. The accuracy rate of sentences is ≥ 95%.

759 Hours - Hindi Speech Data by Mobile Phone

The data is 759 hours long and was recorded by 1,425 Indian native speakers. The accent is authentic. The recording text is designed by language experts and covers general, interactive, car, home and other categories. The text is manually proofread, and the accuracy is high. Recording devices are mainstream Android phones and iPhones. It can be applied to speech recognition, machine translation, and voiceprint recognition.

234 Hours-Japanese Speech Data by Mobile Phone_R

It collects 799 Japanese locals and is recorded in quiet indoor places, streets, restaurant. The recording includes 210,000 commonly used written and spoken Japanese sentences. The error rate of text transfer sentence is less than 5%. Recording devices are mainstream Android phones and iPhones.

f47e2da1-290e-45df-a4df-3d7379310a6d