Please fill in your name
Mobile phone format error
Please enter the telephone
Please enter your company name
Please enter your company email
Please enter the data requirement
Successful submission! Thank you for your support.
Format error, Please fill in again
The data requirement cannot be less than 5 words and cannot be pure numbers
Right now, the voice assistant has become a standard on the smartphones. Apple’s Siri, Amazon’s Alexa, and Samsung’s Bixby are the representatives of smartphone voice assistants.
Speech technology is one of the areas where artificial intelligence has made the fastest breakthroughs, and the error rate of speech recognition has dropped from nearly a third in 2012 to about 3% today. This technological breakthrough allows machines to “hear” and, in a sense, “understand” human thoughts and intentions.
When it comes to speech technology, many people think of speech recognition input method or speech-to-text in WeChat, etc., but in fact this is just speech recognition technology (ASR). Speech technology also includes many branchs, such as voiceprint recognition, TTS, voice cloning, speech enhancement and etc. The most promising application in future is undoubtedly the voice assistant.
The voice assistant technology achieves the user’s command through human-machine dialogue. The specific implementation is: first convert the speech into text through speech recognition, then process and understand the text content through natural language recognition (NLP), respond to the command through the background, and complete the feedback through speech synthesis. The whole process of the human-machine dialogue.
As a world’s leading AI data provider, Datatang has been adhering to the corporate vision of “Empower AI with data and change the world with intelligence” for many years. In order to help more researchers broaden the research field, enrich the research content, and accelerate the technological iteration, Datatang has developed a series of speech datasets for voice assistant with multiple languages and domains, such reading speech, natural dialogue, mixed speech and children speech.
Reading Speech Data
The data set contains 349 American English speakers’ speech data, all of whom are American locals. The recording contents cover various categories like economics, entertainment, news and spoken language.
The data set contains 346 British English speakers’ speech data, all of whom are English locals. Recording contents contain various categories like economics, news, entertainment, commonly used spoken language, letter, figure, etc.
It collects 799 Japanese locals and is recorded in quiet indoor places, streets and restaurant. The recording contents cover various fields like economy, entertainment, news and spoken language.
Natural Dialogue Data
The dataset contains 1,000 hours of American English conversation speech data. It’s recorded by 2,000 native speakers. The speakers start the conversation around a familar topic, to ensure the smoothness and nature of the conversation.
The dataset contains 500 hours of French conversation speech data. It’s recorded by about 1,000 native speakers. The speakers start the conversation around a familiar topic, to ensure the smoothness and nature of the conversation.
Nearly 300 speakers participated in the recording and conducted face-to-face communication in a natural way. They had free discussion on a number of given topics, with a wide range of fields; the voice was natural and fluent, in line with the actual dialogue scene.
Mixed Speech Data
The data is recorded by Chinese native speakers with accents covering seven major dialect areas. The recorded text is a mixture of Chinese and English sentences, covering general scenes and human-computer interaction scenes.
Children Speech Data
9,780 speakers are children aged 6 to 12, with accent covering 7 Chinese dialect regions. The content contains common children languages such as stories, numbers, and their interactions in car, at home, and with voice assistants.
It is recorded by 219 American children native speakers. The recording texts are mainly storybook, children’s song, spoken expressions, etc. 350 sentences for each speaker. Each sentence contain 4.5 words in average. Each sentence is repeated 2.1 times in average.
The ultimate goal of voice assistant technology is to be a real personal assistant, which can complete a certain level of complexity and help you obtain certain information. With the maturity and application of technology, voice assistants will become the operation mode of streaming mobile devices in the future.
If you need data services, please feel free to contact us: email@example.com
<p class="im b ch ci gk in ii ij io ik il ip gw">With the boost of “One Belt, One Road” policy and AI and cloud computing technology, more and more Chinese tech companies has gone global. However, for some AI companies, the road to go abroad still faces many problems. …</p>
<p class="jw b do dp hm ku hn ho kv hq hs jz ii">Recently, AI expert Andrew Ng shared the 2022 AI trend forecast on the DeepLearning.AI platform. He mentioned that multi-modal AI will be the future of AI. Multimodal refers to different types of data, such as text, video, audio, video, etc. Research on multimodal AI dates back decades. In 1989, researchers…</p>