[{"@type":"PropertyValue","name":"Format","value":"16kHz, 16bit, uncompressed wav, mono channel;"},{"@type":"PropertyValue","name":"Recording condition","value":"Low background noise(indoor), without echo;"},{"@type":"PropertyValue","name":"Content category","value":"Generic domain; human-machine interaction; numbers; Shanghai POI;"},{"@type":"PropertyValue","name":"Recording device","value":"Android Smartphone, iPhone;"},{"@type":"PropertyValue","name":"Speaker","value":"2,956 people in total, 35% male and 65% female;"},{"@type":"PropertyValue","name":"Country","value":"China(CHN);"},{"@type":"PropertyValue","name":"Language","value":"Shanghai dialect;"},{"@type":"PropertyValue","name":"Features of annotation","value":"Transcription text; special identifiers, 4 noise symbols"},{"@type":"PropertyValue","name":"Accuracy Rate","value":"Sentence Accuracy Rate (SAR) 95% (Noise symbols and other identifiers are excluded)"}]
{"id":56,"datatype":"1","titleimg":"https://res.datatang.com/asset/productNew/APY161101021.png?Expires=2007353619&OSSAccessKeyId=LTAI5tQwXnJZbubgVfVa1ep9&Signature=QoVFtS9iRRYzya7EHFBqAxTvbu8%3D","type1":"165","type1str":null,"type2":"165","type2str":null,"dataname":"1,030 Hours - Shanghai Dialect(China) Scripted Monologue Smartphone speech dataset","datazy":[{"title":"Format","value":"16kHz, 16bit, uncompressed wav, mono channel;"},{"title":"Recording condition","value":"Low background noise(indoor), without echo;"},{"title":"Content category","value":"Generic domain; human-machine interaction; numbers; Shanghai POI;"},{"title":"Recording device","value":"Android Smartphone, iPhone;"},{"title":"Speaker","value":"2,956 people in total, 35% male and 65% female;"},{"title":"Country","value":"China(CHN);"},{"title":"Language","value":"Shanghai dialect;"},{"title":"Features of annotation","value":"Transcription text; special identifiers, 4 noise symbols"},{"title":"Accuracy Rate","value":"Sentence Accuracy Rate (SAR) 95% (Noise symbols and other identifiers are excluded)"}],"datatag":"China,Dialect,Shanghai,Smartphone,Reading,Scripted Monologue","technologydoc":null,"downurl":null,"datainfo":"2,956 speakers, all of whom are native Shanghai dialect speakers. All speakders read the text by Shanghai dialect. Recording content: extensive content, covering multi-fields customer consultancy, short messages, numbers, Shanghai POI. The sentences are manully transcribed and checked by professional annatator, with high accuracy.","standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":["2,956 people","quiet indoor","16kHz, 16bit, wav"],"samplePresentation":[["mp3","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY161101021_demo1695808858460/APY161101021_demo/T0065G0089S0072.wav?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=eQ1f2wrip6MSSxCmEKx7uOrhCFU%3D","/data/apps/damp/temp/ziptemp/APY161101021_demo1695808858460/APY161101021_demo/T0065G0089S0072.wav","侬晓得哪能申请QQ号头[n]"],["mp3","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY161101021_demo1695808858460/APY161101021_demo/T0065G0089S0076.wav?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=CBjvwWr4CPD3YV%2BJ6tLZK67ciCI%3D","/data/apps/damp/temp/ziptemp/APY161101021_demo1695808858460/APY161101021_demo/T0065G0089S0076.wav","[s]侬最近有啥烦恼个事体伐[n]"],["mp3","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY161101021_demo1695808858460/APY161101021_demo/T0065G0001S0005.wav?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=fEvQvaEQDo13tmxt3NWDwxUflU8%3D","/data/apps/damp/temp/ziptemp/APY161101021_demo1695808858460/APY161101021_demo/T0065G0001S0005.wav","我也没伊电话[n]"],["mp3","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY161101021_demo1695808858460/APY161101021_demo/T0065G0014S0011.wav?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=TRiwlpHWon7fzM5Nos9lJGWwlbU%3D","/data/apps/damp/temp/ziptemp/APY161101021_demo1695808858460/APY161101021_demo/T0065G0014S0011.wav","今朝热火队赢球了伐"],["mp3","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY161101021_demo1695808858460/APY161101021_demo/T0065G0029S0036.wav?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=hTdWR9qkcudONP5h1zYBKjLbM3U%3D","/data/apps/damp/temp/ziptemp/APY161101021_demo1695808858460/APY161101021_demo/T0065G0029S0036.wav","播放王菲个歌[s]曲"]],"officialSummary":"Shanghai Dialect(China) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, numbers, Shanghai POI. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(2.956 speakers from Shanghai), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.","dataexampl":"","datakeyword":["Shanghai dialect"," Shanghai dialect collection"," mobile phone collection of voice data"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Language,Data Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"speechRec","BGimg":"brightSpot_audio","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"],"single":"no"}
Shanghai Dialect(China) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, numbers, Shanghai POI. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(2.956 speakers from Shanghai), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Mandarin Chinese(China) Heavy Accent Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and control, in-car command and control, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(2,444 people in total, mainly from southern China, part of them are from northern China), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
English(China) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, informal English, human-machine interaction and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,279 people in total, covering 7 dialect regions across China), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
English(the United States) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,842 American in total), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Malay(Malaysia) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, news and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(675 people in total), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Indonesian(Indonesia) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, news and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,285 people in total), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
English(Spain) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(891 people in total), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
English(France) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,089 people in total), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
English(Germany) Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering generic domain, human-machine interaction, smart home command and in-car command, numbers and other domains. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers(1,162 people in total), geographicly speaking, enhancing model performance in real and complex tasks.nQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.