[{"@type":"PropertyValue","name":"Format","value":"48kHz, 16bit, uncompressed wav, mono channel"},{"@type":"PropertyValue","name":"Content category","value":"Wake-up words"},{"@type":"PropertyValue","name":"Recording condition","value":"Professional recording studio"},{"@type":"PropertyValue","name":"Recording device","value":"Microphone"},{"@type":"PropertyValue","name":"Speaker","value":"1,027 Chinese in total, 48% male and 52% female"},{"@type":"PropertyValue","name":"Country","value":"China(CHN)"},{"@type":"PropertyValue","name":"Language","value":"Mandarin Chinese,English"},{"@type":"PropertyValue","name":"Features of annotation","value":"Transcription text, accent, birthplace, gender"}]
{"id":1076,"datatype":"1","titleimg":"https://res.datatang.com/asset/productNew/APY200731001.png?Expires=2007353686&OSSAccessKeyId=LTAI5tQwXnJZbubgVfVa1ep9&Signature=GbcKv8f7fg%2BFXiXoyaM2uI8Tzaw%3D","type1":"165","type1str":null,"type2":"165","type2str":null,"dataname":"1,027 People - Mandarin Chinese and English Wake-up Words Scripted Monologue Microphone speech dataset","datazy":[{"title":"Format","value":"48kHz, 16bit, uncompressed wav, mono channel"},{"title":"Content category","value":"Wake-up words"},{"title":"Recording condition","value":"Professional recording studio"},{"title":"Recording device","value":"Microphone"},{"title":"Speaker","value":"1,027 Chinese in total, 48% male and 52% female"},{"title":"Country","value":"China(CHN)"},{"title":"Language","value":"Mandarin Chinese,English"},{"title":"Features of annotation","value":"Transcription text, accent, birthplace, gender"}],"datatag":"Wake-up,Microphone,Scripted Monologue,Mandarin Chinese,English","technologydoc":null,"downurl":null,"datainfo":"","standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":"","samplePresentation":[["mp3","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY200731001_demo1712743253987/APY200731001_demo/G00003S0103.wav?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=VeUbHW70Fv8XWGhN42s6ebUfVB4%3D","/data/apps/damp/temp/ziptemp/APY200731001_demo1712743253987/APY200731001_demo/G00003S0103.wav","小艺小艺"],["mp3","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY200731001_demo1712743253987/APY200731001_demo/G00003S0107.wav?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=swotdBmL8y84lz2wJhzbIaocYYk%3D","/data/apps/damp/temp/ziptemp/APY200731001_demo1712743253987/APY200731001_demo/G00003S0107.wav","小艺小艺"],["mp3","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY200731001_demo1712743253987/APY200731001_demo/G00003S0109.wav?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=oFNBKmWNXHgQRpXMinN7QPjHV%2Bo%3D","/data/apps/damp/temp/ziptemp/APY200731001_demo1712743253987/APY200731001_demo/G00003S0109.wav","小艺小艺"],["mp3","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY200731001_demo1712743253987/APY200731001_demo/G00003S0101.wav?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=fWfO7HeNge7gsZ91Blk2Bigb038%3D","/data/apps/damp/temp/ziptemp/APY200731001_demo1712743253987/APY200731001_demo/G00003S0101.wav","小艺小艺"],["mp3","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY200731001_demo1712743253987/APY200731001_demo/G00003S0105.wav?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=Bt%2FqPxjn53q5DFYlaAr%2FSmvPUtI%3D","/data/apps/damp/temp/ziptemp/APY200731001_demo1712743253987/APY200731001_demo/G00003S0105.wav","小艺小艺"]],"officialSummary":"Mandarin Chinese and English Wake-up Words Scripted Monologue Microphone speech dataset, collected from monologue based on given wake-up words, covering 3 speech rates: low, normal, and fast. Transcribed with text content, accent, birthplace, gender and other attributes. Our dataset was collected from extensive and diversify speakers(1027 Chinese) in professional recording studio, geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.","dataexampl":"","datakeyword":["wake-up words"," wake-up"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Language,Data Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"speechRec","BGimg":"brightSpot_audio","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"],"single":"no"}
1,027 People - Mandarin Chinese and English Wake-up Words Scripted Monologue Microphone speech dataset
wake-up words
wake-up
Mandarin Chinese and English Wake-up Words Scripted Monologue Microphone speech dataset, collected from monologue based on given wake-up words, covering 3 speech rates: low, normal, and fast. Transcribed with text content, accent, birthplace, gender and other attributes. Our dataset was collected from extensive and diversify speakers(1027 Chinese) in professional recording studio, geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Indonesian(Indonesia) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(412 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
audio data dataset conversational asr data Indonesian
Filipino(the Philippines) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(140 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
audio data dataset conversational asr data Filipino
Spanish(Spain) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(600 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
French(France) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(964 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Italian(Italy) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(676 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
ItalianConversationaltelephone
444,202 Korean Pronunciation Dictionary
The data contains 444,202 entries. All words and pronunciations are produced by Korean linguists. It can be used in the research and development of Korean ASR technology.
Thai(Thailand) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(1,986 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Portuguese(Brazil) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(142 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Conversational speechPortuguese asr data russian asr dataset Brazilian Portuguese