What is Speech Recognition Dataset?

From：Nexdata Date： 2024-08-15

➤ Speech recognition dataset issues

Application fields of artificial intelligence is fast expanding, and the driving force behind this comes from the richness and diversity of datasets. Whether it is medical image analysis, autonomous driving or smart home systems, the accumulation of large amount of datasets provides infinite possibilities for AI application scenarios.

In the past ten years, driven by deep learning, speech recognition technology and applications have achieved rapid development. Related products and services equipped with speech recognition technology, such as voice search, voice input method, smart speaker, smart TV, smart wearable, intelligent customer service, robots, etc. have been widely used in all aspects of our lives

It's well-known that speech recognition systems are cost-effective, or not, depending on your speech recognition dataset. The speech recognition datasets on the market have a small amount of data, a single scene, and lack of challenges, which cannot reflect the generalization ability of the research model in large-scale speech recognition dataset and complex scenarios. The industry often uses larger-scale speech recognition dataset for research, while the academia cannot obtain these data for research, which leads to a serious split between speech recognition research in academia and industry. On the other hand, unsupervised learning and self-learning, the current research hotspots, also lack the support of large-scale speech recognition dataset in the field of speech recognition.

Nexdata Speech Recognition Dataset

➤ Speech Recognition Datasets

1,505 Hours-Mandarin Speech Recognition Dataset by Mobile Phone

It collects 6,278 speakers' data from 33 provinces of China. 2,980 males and 3,298 females. The recording contents are commonly used colloquial sentences. It is recorded in both quiet and noisy environment. Annotated texts are transcribed and proofread by professional annotators. The accuracy is not less than 98%.

344 People - American English Speech Recognition Dataset by Mobile Phone_Guiding

The data set contains 344 American English speakers' Speech Recognition Dataset, all of whom are American locals. 50 sentences for each speaker. The valid data is 9.7 hours. It is recorded in quiet environment. The contents cover in-car scenario, smart home and speech assistant.

201 Hours – North American English Speech Recognition Dataset by Mobile Phone and PC

The data set contains 302 North American speakers' Speech Recognition Dataset. The recording contents include phrases and sentences with rich scenes. The valid time is 201 hours. The recording environment is quiet indoor. The recording device includes PC, android cellphone, and iPhone.

986 Hours - European Portuguese Speech Recognition Dataset by Mobile Phone

➤ Speech recognition datasets

It is Speech Recognition Dataset of 2,109 Portuguese natives with authentic accents. The recorded text is designed by professional language experts and is rich in content, covering multiple categories such as general purpose, interactive, vehicle-mounted and household commands. The recording environment is quiet and without echo. The texts are manually transcribed with a high accuracy rate. Recording devices are mainstream Android phones and iPhones.

211 Hours - German Speech Recognition Dataset by Mobile Phone_Reading

The data set contains 327 German native speakers' Speech Recognition Dataset. The recording contents include economics, entertainment, news, oral, figure, letter, etc. Each sentence contains 10.3 words on average. Each sentence is repeated 1.4 times on average. All texts are manually transcribed to ensure the high accuracy.

490 People - Thai Speech Recognition Dataset by Mobile Phone_Guiding

Thai Speech Recognition Dataset (guiding) is collected from 490 Thailand native speakers and is recorded in quiet environment. The recording is rich in content, covering multiple categories such as in-car scene, smart home, voice assistant. 50 sentences for each speaker. The valid volume is 15 hours. All texts are manual transcribed with high accuracy.

351 People – Italian Speech Recognition Dataset by Mobile Phone_Guiding

The 351 People – Italian Speech Recognition Dataset of conversations collected by phone, developed with proper balance of gender ratio and geographical distribution. Speakers would choose linguistic experts designed topics conduct conversations. 50 sentences for each speaker. The recording devices are various mobile phones. The audio format is 16kHz, 16bit, uncompressed WAV, and all the Speech Recognition Dataset was recorded in quiet indoor environments. All the Speech Recognition Dataset was manually transcribed with text content, the start and end time of each effective sentence, and speaker identification. The accuracy rate of sentences is ≥ 95%.

End

If you want to know more details about the speech recognition datasets or how to acquire, please feel free to contact us: [email protected].

Standing at the forefront of technology revolution, we are well aware of the power of data. In the future, through contentiously improve data collection and annotation process, AI system will become more intelligent. All walks of life should actively embrace the innovation of data-driven to stay ahead in the fierce market competition and bring more value for society.

What is Speech Recognition Dataset?

Recent

Embodied intelligence 101: IShowSpeed Dances with Advanced Robot in Shenzhen

Join Nexdata MLC-SLM Workshop at Interspeech 2025

Exploring Datasets for iBeta Certification: A Guide for Biometric System Developers

Previous

Case study for emotion data annotation

Next

Nexdata Uncommon Language Speech Recognition Dataset