Please fill in your name
Mobile phone format error
Please enter the telephone
Please enter your company name
Please enter your company email
Please enter the data requirement
Successful submission! Thank you for your support.
Format error, Please fill in again
The data requirement cannot be less than 5 words and cannot be pure numbers
The essence of ASR is a pattern recognition system, including three basic units: feature extraction, pattern matching, and reference patterns. Feature extraction is applied to the labeling method of attribute classification. First, the input speech is preprocessed, and then the characteristics of the speech are extracted. On this basis, the template required for speech recognition is established, and then the original speech template stored in the computer is Compare with the characteristics of the input speech signal to find out the best template that matches the input speech.
According to the definition of this template, by looking up the table, you can get the best recognition result of the computer. This best result is directly related to the selection of features, the quality of the voice model, and the accuracy of the template. It requires continuous training of a large number of audio dataset to obtain.
Therefore, the success of speech recognition technology largely depends on large-scale high-quality audio datasets. Datatang has accumulated multi-channel, multi-environment, and multi-type audio dataset, covering in more than 60 languages.
The data set contains 344 American English speakers' Audio Dataset, all of whom are American locals. 50 sentences for each speaker. The valid data is 9.7 hours. It is recorded in quiet environment. The contents cover in-car scenario, smart home and speech assistant.
The data set contains 346 British English speakers' Audio Dataset, all of whom are English locals. Around 392 sentences of each speaker. The valid audio dataset is 199 hours. Recording environment is quiet. Recording contents contain various categories like economics, news, entertainment, commonly used spoken language, letter, figure, etc.
351 People – German Audio Dataset by Mobile Phone_Guiding were collected and recorded by 351 German native speakers with authentic accents. The recorded text is designed by professional language experts and is rich in content, covering multiple categories such as general purpose, interactive, vehicle-mounted and household commands. The recording environment is quiet and without echo.
401 speakers participate in this recording. 50 sentences for each speaker, total 10.9 hours. Recording texts include in-car scene, smart home, smart speech assistant. Texts are accurate after manually transcribed.
397 People - Hindi Audio Dataset by Mobile Phone_Guiding is recorded by 397 Indian with authentic accent, 50 sentences for each speaker, total 8.6 hours. The recording content involves car scene, smart home, intelligent voice assistant.
1960 Russian native speakers participated in the recording with authentic accent. The recorded script is designed by linguists and cover a wide range of topics including generic, interactive, in-vehicle and home.
If you want to know more details about the audio datasets or how to acquire, please feel free to contact us: [email protected].
Speech emotion recognition generally refers to the process by which a machine automatically recognizes human emotions and emotion-related states from speech.
Datatang will participate in the 10th-anniversary exhibition of Tech.AD Europe, in Berlin from March 26-28, and plans to create a new approach to engaging the audience to experience Datatang's latest autonomous driving data solutions.