Korean Medical Speech Dataset – 203 Hours of Clinical Conversations

Korean medical speech dataset

healthcare audio data

medical voice dataset

Korean clinical conversation dataset

domain-specific ASR

medical transcription Korean

doctor-patient audio Korean

medical chatbot dataset

This Korean Medical Speech Dataset contains 203 hours of real-world audio including casual conversations and monologues. It spans a wide range of healthcare-related content such as medical consultations, academic lectures, training sessions, and clinical discussions. The dataset includes detailed annotations: transcripts, speaker ID, gender, and tagged medical entities. Designed for use in ASR, medical NLU, speech-based healthcare assistants, and AI model fine-tuning for domain-specific speech recognition. The recordings were collected from a geographically diverse speaker base and validated by multiple AI companies. All data complies with GDPR, CCPA, and PIPL regulations.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Recommended Dataset

283 Hours - Indonesian(Indonesia) Spontaneous Dialogue Telephony speech dataset

Indonesian(Indonesia) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(376 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Conversational Speech Telephony Indonesian

163 Hours Russian Children Speech Dataset – Real-World Speech Data for AI Training

163 hours of Russian children’s speech dataset featuring real-world conversational and monologue recordings. The dataset captures natural speech from children aged 12 and under, reflecting authentic communication patterns in real-world scenarios. All audio samples are transcribed and include rich metadata such as speaker ID, gender, age, and accent information. The dataset is collected from diverse speakers across multiple geographic regions, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

russian children speech dataset kids speech dataset russian child speech recognition dataset pediatric speech dataset russian ASR dataset children

162 Hours - French(France) Children Real-world Casual Conversation and Monologue speech dataset

French(France) Children Real-world Casual Conversation and Monologue speech dataset, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

French Spontaneous Speech Child

346 Hours - Spanish(Mexico) Spontaneous Dialogue Smartphone speech dataset

Spanish(Mexico) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(338 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spanish Mexican Conversation Phone

80 Hours - French(Canada) Spontaneous Dialogue Smartphone speech dataset

French(Canada) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(126 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

French Conversational Phone Canada

406 Hours - Portuguese(European) Spontaneous Dialogue Smartphone Speech Dataset

Portuguese(Portugal) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(590 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Portuguese European Mobile Phone

101 Hours - Italian(Italy) Children Real-world Casual Conversation and Monologue speech dataset

Italian(Italy) Children Real-world Casual Conversation and Monologue speech dataset, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Italian Casual Conversation Monologue Asr

97 Hours – German Children Speech Dataset (Conversations & Monologues)

The 97-hour German Children Speech Dataset. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

German children speech dataset German kids speech recognition German child speech corpus German ASR dataset children German kids voice dataset German conversational speech children German child dialogue dataset German children NLP dataset German child language dataset multilingual children speech data

Korean Medical Speech Dataset – 203 Hours of Clinical Conversations

Korean medical speech dataset healthcare audio data medical voice dataset Korean clinical conversation dataset domain-specific ASR medical transcription Korean doctor-patient audio Korean medical chatbot dataset

Current Project Maturity

Korean medical speech dataset

healthcare audio data

medical voice dataset

Korean clinical conversation dataset

domain-specific ASR

medical transcription Korean

doctor-patient audio Korean

medical chatbot dataset