234 Hours-Japanese Speech Dataset (Mobile Phone Recordings)

Japanese audio dataset

Japanese ASR training data

Japanese spontaneous dialogue dataset

Japanese speech dataset

This dataset contains 234 hours of Japanese speech audio, collected from monologue based on given scripts, covering 210,000 formal or informal expressions. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(799 Japanese recorded in mixed condition, such as indoor, roadside, restaurant, etc.), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Recommended Dataset

240 Hours - Hindi(India) Speech Dataset (Scripted Monologue)

This dataset collected from monologue based on given scripts, covering economy, entertainment, news, informal language, numbers, alphabet and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(401 Indian recorded in quiet and noisy condition), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

hindi phone call speech dataset hindi tts dataset hindi speech corpus hindi audio dataset hindi asr dataset hindi telephony speech dataset hindi dialogue speech dataset hindi conversational speech dataset

227 Hours Multi-Accent Spanish Speech Dataset with Transcripts for ASR Training

This dataset contains 227 hours of Spanish scripted speech recorded as monologues based on predefined texts. It includes recordings from 352 native speakers from Spain, Mexico, Venezuela, and other Spanish-speaking regions. The speech content covers economics, entertainment, news, informal language, numbers, alphabet sequences, and other general domains. Each recording includes high-quality transcripts, timestamps, noise labels, and additional speaker metadata. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

spanish speech to text dataset spanish asr dataset spanish speech recognition dataset spanish speech dataset for asr training smartphone speech dataset

231.9 Hours French Speech Dataset with Diverse Speakers for AI Training

This dataset contains 231.9 hours of French scripted monologue speech collected from native speakers based on predefined scripts. The recordings cover multiple domains, including economy, entertainment, news, informal speech, numbers, and alphabet sequences. Each audio sample includes accurate transcripts and additional metadata. The dataset was collected from 406 speakers from France, Canada, and African French-speaking regions, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

french speech dataset french audio dataset french asr dataset european speech dataset voice assistant dataset

199 Hours - English(the United Kingdom) English Scripted Monologue Smartphone speech dataset

English(the United Kingdom) English Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering economy, entertainment, news, informal language, numbers, alphabet and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(346 British people), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

British English pronunciation mobile phone voice data collection voice reading English data

215 Hours American English Speech Dataset with Reading-style Speech

This dataset contains 215 hours of American English scripted read speech collected from 349 native speakers using mobile devices. The recordings are based on predefined scripts covering various content categories, including economy, entertainment, news, informal speech, numbers, and alphabet reading. Each audio sample is transcribed with corresponding text content and additional metadata. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

us english speech dataset english ASR dataset american english speech dataset speech AI dataset multilingual speech dataset

127 Hours - Malay(Malaysia) Scripted Monologue Smartphone speech dataset

Malay(Malaysia) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering economy, entertainment, news, informal language, numbers, alphabet and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(156 Malaysian), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Malay data mobile phone collected voice data reading voice Malaysian voice

359 Hours - Indonesian(Indonesia) Scripted Monologue Smartphone speech dataset

Indonesian(Indonesia) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering economy, entertainment, news, informal language, numbers, alphabet domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(496 speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Indonesian data mobile phone collected voice data read voice Indonesian voice

203 Hours Thai ASR Dataset with Prompt-based Speech Recordings

This dataset contains 203 hours of Thai speech collected through prompted monologue recordings using smartphone devices. The recordings are based on predefined scripts and cover multiple domains, including economics, entertainment, news, conversational language, numbers, and letters. Each audio includes accurate text transcripts and related metadata. The dataset was collected from 498 native Thai speakers with diverse demographic and geographic backgrounds, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Thai Speech Dataset Thai ASR Dataset Thai Speech Recognition Dataset Thai Audio Dataset

234 Hours-Japanese Speech Dataset (Mobile Phone Recordings)

Japanese audio dataset Japanese ASR training data Japanese spontaneous dialogue dataset Japanese speech dataset

Current Project Maturity

Japanese audio dataset

Japanese ASR training data

Japanese spontaneous dialogue dataset

Japanese speech dataset