435 Hours European Spanish Speech Dataset for ASR, Smart Home & In-Car Voice AI

european spanish speech dataset

spanish spain speech dataset

spanish speech dataset

spanish audio dataset

european spanish asr dataset

This dataset contains 435 hours of European Spanish scripted speech recorded by 989 native speakers. The dataset includes read monologues based on predefined prompts across general speech, news, numbers, human-machine interaction, smart home command and control, and in-car voice command scenarios. Each recording includes transcription text and speaker metadata. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Recommended Dataset

240 Hours - Hindi(India) Speech Dataset (Scripted Monologue)

This dataset collected from monologue based on given scripts, covering economy, entertainment, news, informal language, numbers, alphabet and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(401 Indian recorded in quiet and noisy condition), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

hindi phone call speech dataset hindi tts dataset hindi speech corpus hindi audio dataset hindi asr dataset hindi telephony speech dataset hindi dialogue speech dataset hindi conversational speech dataset

227 Hours Multi-Accent Spanish Speech Dataset with Transcripts for ASR Training

This dataset contains 227 hours of Spanish scripted speech recorded as monologues based on predefined texts. It includes recordings from 352 native speakers from Spain, Mexico, Venezuela, and other Spanish-speaking regions. The speech content covers economics, entertainment, news, informal language, numbers, alphabet sequences, and other general domains. Each recording includes high-quality transcripts, timestamps, noise labels, and additional speaker metadata. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

spanish speech to text dataset spanish asr dataset spanish speech recognition dataset spanish speech dataset for asr training smartphone speech dataset

231.9 Hours French Speech Dataset with Diverse Speakers for AI Training

This dataset contains 231.9 hours of French scripted monologue speech collected from native speakers based on predefined scripts. The recordings cover multiple domains, including economy, entertainment, news, informal speech, numbers, and alphabet sequences. Each audio sample includes accurate transcripts and additional metadata. The dataset was collected from 406 speakers from France, Canada, and African French-speaking regions, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

french speech dataset french audio dataset french asr dataset european speech dataset voice assistant dataset

199 Hours British English Speech Dataset for ASR and Voice AI Training

This dataset contains 199 hours of scripted British English speech collected from 346 native British speakers using smartphone devices. The recordings are based on predefined texts covering various domains, including news, entertainment, economy, informal expressions, numbers, alphabetic content, and other general speech scenarios. Each audio sample is transcribed with corresponding text content and additional metadata attributes. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

british english speech dataset uk english speech dataset english ASR dataset read speech dataset speech recognition training data native british speech dataset tts training data

215 Hours American English Speech Dataset with Reading-style Speech

This dataset contains 215 hours of American English scripted read speech collected from 349 native speakers using mobile devices. The recordings are based on predefined scripts covering various content categories, including economy, entertainment, news, informal speech, numbers, and alphabet reading. Each audio sample is transcribed with corresponding text content and additional metadata. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

us english speech dataset english ASR dataset american english speech dataset speech AI dataset multilingual speech dataset

127 Hours - Malay(Malaysia) Scripted Monologue Smartphone speech dataset

Malay(Malaysia) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering economy, entertainment, news, informal language, numbers, alphabet and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(156 Malaysian), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Malay data mobile phone collected voice data reading voice Malaysian voice

359 Hours - Indonesian(Indonesia) Scripted Monologue Smartphone speech dataset

Indonesian(Indonesia) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering economy, entertainment, news, informal language, numbers, alphabet domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(496 speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Indonesian data mobile phone collected voice data read voice Indonesian voice

203 Hours Thai ASR Dataset with Prompt-based Speech Recordings

This dataset contains 203 hours of Thai speech collected through prompted monologue recordings using smartphone devices. The recordings are based on predefined scripts and cover multiple domains, including economics, entertainment, news, conversational language, numbers, and letters. Each audio includes accurate text transcripts and related metadata. The dataset was collected from 498 native Thai speakers with diverse demographic and geographic backgrounds, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Thai Speech Dataset Thai ASR Dataset Thai Speech Recognition Dataset Thai Audio Dataset

435 Hours European Spanish Speech Dataset for ASR, Smart Home & In-Car Voice AI

european spanish speech dataset spanish spain speech dataset spanish speech dataset spanish audio dataset european spanish asr dataset

Current Project Maturity

european spanish speech dataset

spanish spain speech dataset

spanish speech dataset

spanish audio dataset

european spanish asr dataset