en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

197 Hours - Korean Scripted Monologue Smartphone speech dataset

Read aloud in Korean
collect voice data by mobile phone
and read aloud
korea
korean
interpret
reading
study
understand
learn
show
decipher
register
translate
record
scan
take
peruse
construe
review
indicate
look
comprehend
say
interpreted
recite
said
deliver
scrutinize
grasp
perceive
declaim
play
pore
over
read
out
view
tell
know
examine
interpreting
learned
make
out
readout
hear
lecture
announce
display
translated
browse
consult
eyeball
check
deciphering
leaf
through
mark

Korean Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering economy, entertainment, news, informal language, numbers, alphabet and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(291 people in total, from South Korea and North Korea), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Format
16kHz, 16bit, uncompressed wav, mono channel;
Recording condition
Low background noise, without echo;
Content category
economy, entertainment, news, informal language, numbers, alphabet;
Recording device
Android smartphone: iPhone=3.2:1;
Speaker
291 people in total, from South Korea and North Korea; 45% male and 55% female;
Country
South Korea(KOR), North Korea(PRK);
Language(Region) Code
ko-KR, ko-KP;
Language
Korean;
Features of annotation
Transcription text, timestamp, 5 noise symbols, special identifiers;
Accuracy Rate
Sentence Accuracy Rate (SAR) 95% (noise symbols and special identifiers are excluded)
Sample Sample
  • Audio

    성종은 자신의 맏아들을 낳은 아내를 내쫓아 죽였다.

  • Audio

    군[[lipsmack]] 산상고도 신정고를 구 대 칠로 물리치고 준준결승에 진출했습니다.

  • Audio

    현재까지 제기된 모든 의혹에 대해서 철저히 조사할 방침입니다.

  • Audio

    굳이 선택해야 한다면 당선가능성과 정체성이 반반이다.

  • Audio

    맨눈으로 가상 현실을 느낄 수 있는 기술입니다.

Recommended DatasetsRecommended Dataset
849 Hours - Mandarin Chinese(China) Human-Machine Interaction Scripted Monologue Smartphone speech dataset

Mandarin Chinese(China) Human-Machine Interaction Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering common-used sentences, smart home commands, Intelligent assistant, wake words, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(998 people in total), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Mandarin home interaction mobile phone language audio data far field home collected audio data subset command words wake-up words
1,002 Hours - Russian(Russia) Scripted Monologue Smartphone speech dataset

Russian(Russia) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,960 people in total), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Russian video data
762 Hours - Spanish(Latin America) Scripted Monologue Smartphone speech dataset

Spanish(Latin America) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers, news and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,630 people in total, such as Mexicans, Colombians, etc.), geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spanish audio data
1,044 Hours - Portuguese(Brazil) Scripted Monologue Smartphone speech dataset

Portuguese(Brazil) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers, news and other domains. Transcribed with text content, speaker ID, timestamp and other attributes. Our dataset was collected from extensive and diversify speakers(2,038 people in total), geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Portuguese collection Portuguese data Portuguese identification asr Speech to text text to speech brazil talk data brazil talk dataset brazil talk file brazil talk record brazil conversional data brazil conversional dataset brazil conversional file brazil conversional record brazil topics data brazil topics dataset brazil topics file brazil topics record brazil small talk data brazil small talk dataset brazil small talk file brazil small talk record brazil dialect data brazil dialect dataset brazil dialect file brazil dialect record brazil speech data brazil speech dataset brazil speech file brazil speech record brazil chatter data brazil chatter dataset brazil chatter file brazil chatter record brazil discuss data brazil discuss dataset brazil discuss file brazil discuss record brazil gossip data brazil gossip dataset brazil gossip file brazil gossip record brazil lecture data brazil lecture dataset brazil lecture file brazil lecture record brazil dialogue data brazil dialogue dataset brazil dialogue file brazil dialogue record Brasil talk data Brasil talk dataset Brasil talk file Brasil talk record Brasil conversional data Brasil conversional dataset Brasil conversional file Brasil conversional record Brasil topics data Brasil topics dataset Brasil topics file Brasil topics record Brasil small talk data Brasil small talk dataset Brasil small talk file Brasil small talk record Brasil dialect data Brasil dialect dataset Brasil dialect file Brasil dialect record Brasil speech data Brasil speech dataset Brasil speech file Brasil speech record Brasil chatter data Brasil chatter dataset Brasil chatter file Brasil chatter record Brasil discuss data Brasil discuss dataset Brasil discuss file Brasil discuss record Brasil gossip data Brasil gossip dataset Brasil gossip file Brasil gossip record Brasil lecture data Brasil lecture dataset Brasil lecture file Brasil lecture record Brasil dialogue data Brasil dialogue dataset Brasil dialogue file Brasil dialogue record Portuguese talk data Portuguese talk dataset Portuguese talk file Portuguese talk record Portuguese conversional data Portuguese conversional dataset Portuguese conversional file Portuguese conversional record Portuguese topics data Portuguese topics dataset Portuguese topics file Portuguese topics record Portuguese small talk data Portuguese small talk dataset Portuguese small talk file Portuguese small talk record Portuguese dialect data Portuguese dialect dataset Portuguese dialect file Portuguese dialect record Portuguese speech data Portuguese speech dataset Portuguese speech file Portuguese speech record Portuguese chatter data Portuguese chatter dataset Portuguese chatter file Portuguese chatter record Portuguese discuss data Portuguese discuss dataset Portuguese discuss file Portuguese discuss record Portuguese gossip data Portuguese gossip dataset Portuguese gossip file Portuguese gossip record Portuguese lecture data Portuguese lecture dataset Portuguese lecture file Portuguese lecture record Portuguese dialogue data Portuguese dialogue dataset Portuguese dialogue file Portuguese dialogue record português talk data português talk dataset português talk file português talk record português conversional data português conversional dataset português conversional file português conversional record português topics data português topics dataset português topics file português topics record português small talk data português small talk dataset português small talk file português small talk record português dialect data português dialect dataset português dialect file português dialect record português speech data português speech dataset português speech file português speech record português chatter data português chatter dataset português chatter file português chatter record português discuss data português discuss dataset português discuss file português discuss record português gossip data português gossip dataset português gossip file português gossip record português lecture data português lecture dataset português lecture file português lecture record português dialogue data português dialogue dataset português dialogue file português dialogue record Portugal talk data Portugal talk dataset Portugal talk file Portugal talk record Portugal conversional data Portugal conversional dataset Portugal conversional file Portugal conversional record Portugal topics data Portugal topics dataset Portugal topics file Portugal topics record Portugal small talk data Portugal small talk dataset Portugal small talk file Portugal small talk record Portugal dialect data Portugal dialect dataset Portugal dialect file Portugal dialect record Portugal speech data Portugal speech dataset Portugal speech file Portugal speech record Portugal chatter data Portugal chatter dataset Portugal chatter file Portugal chatter record Portugal discuss data Portugal discuss dataset Portugal discuss file Portugal discuss record Portugal gossip data Portugal gossip dataset Portugal gossip file Portugal gossip record Portugal lecture data Portugal lecture dataset Portugal lecture file Portugal lecture record Portugal dialogue data Portugal dialogue dataset Portugal dialogue file Portugal dialogue record
986 Hours - Portuguese(Europe) Scripted Monologue Smartphone speech dataset

Portuguese(Europe) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers, news and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(2,109 people in total), geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Portuguese data mobile phone voice data voice data
769 Hours - French(France) Scripted Monologue Smartphone speech dataset

French(France) Scripted Monologue Smartphone speech dataset, collected from monologue based on given prompts, covering general category; human-machine interaction category. Transcribed with text content. Our dataset was collected from extensive and diversify speakers(1623 native speakers), geographicly speaking, enhancing model performance in real and complex tasks.Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

French mobile phones collect voice data French collection French voice data and developmental voice recognition data
435 Hours - Spanish(Spain) Scripted Monologue Smartphone speech dataset

Spanish(Spain) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers, news and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(989 people in total), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spanish data Spanish pronunciation Spanish acquisition Spanish identification data
831 Hours - English(the United Kingdom) Scripted Monologue Smartphone speech dataset

English(the United Kingdom) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,651 British people in total), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Mobile Telephony British English Speech Data
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

c304de3c-8041-4585-8c76-e3c2489356b0

4f77bd53-53e0-4273-ab24-7bb9eeeda6f4