American English Speech Recognition Data

From：Nexdata Date： 2024-08-15

➤ Importance of voice in HCI

The rapid development of artificial intelligence cannot leave the support of high-quality datasets. Whether it is commercial applications or scientific research, datasets provide a continuous source of power for AI technology. Datasets aren’t only the input for algorithm training, but also the determining factor affecting the maturity of AI technology. By using real world data, researchers can train more robust AI model to handle various unpredictable scenario changes.

The importance of voice to human-computer interaction is beyond doubt. The Chinese-English translation effect of some intelligent translation devices on the market has reached a professional level. However, due to the large number of languages, different pronunciation systems and pronunciation skills, multilingual speech recognition still faces great challenges.

Nexdata has been deeply involved in the field of AI data services for many years, has a professional data processing team and strong data collection and processing capabilities, and has rich practical experience in data collection and labeling. With more than 12 years of data service experience, Nexdata has accumulated a wealth of American English Speech Recognition data.

➤ American English speech data

215 Hours - American English Speech Data by Mobile Phone_Reading

The data set contains 349 American English speakers' speech data, all of whom are American locals. It is recorded in quiet environment. The recording contents cover various categories like economics, entertainment, news and spoken language. It is manually transcribed and annotated with the starting and ending time points.

1,136 Hours – American English Conversational Speech Data by Mobile Phone

The 1,136-hour American English speech data of natural conversations collected by phone involved more than 1,000 native English speakers in America, developed with proper balance of gender ratio and geographical distribution. Speakers would choose a few familiar topics out of the given list and start conversations to ensure dialogues' fluency and naturalness. The recording devices are various mobile phones. The audio format is 16kHz, 16bit, uncompressed WAV, and all the speech data was recorded in quiet indoor environments. All the speech audio was manually transcript with text content, the start and end time of each effective sentence, and speaker identification. The accuracy rate of sentences is ≥ 95%.

800 Hours - American English Speech Data by Mobile Phone

1842 American native speakers participated in the recording with authentic accent. The recorded script is designed by linguists, based on scenes, and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones.

➤ 344 American English speakers' speech data

344 People - American English Speech Data by Mobile Phone_Guiding

The data set contains 344 American English speakers' speech data, all of whom are American locals. 50 sentences for each speaker. The valid data is 9.7 hours. It is recorded in quiet environment. The contents cover in-car scenario, smart home and speech assistant.

On the road to intelligent future, data will always be an indispensable driving force. The continuous expanding and optimizing of all kinds of datasets will provide a broader application space for AI algorithms. By constant exploring new data collection and annotation methods, all industries can better handle complex application scenarios. If you have data requirements, please contact Nexdata.ai at [email protected].

American English Speech Recognition Data

Recent

How to Train Embodied AI That Works Everywhere: A Universal Dataset Blueprint

Embodied intelligence 101: IShowSpeed Dances with Advanced Robot in Shenzhen

Join Nexdata MLC-SLM Workshop at Interspeech 2025

Previous

Challenges of Korea Speech Recognition

Next

Accented English Speech Recognition Data