[{"@type":"PropertyValue","name":"Format","value":"Video: mp4 format, 1,280*720, Audio: wav format, 16HZ, 16bit mono"},{"@type":"PropertyValue","name":"Recording Environment","value":"Using quiet sunny room to stimulate daytime outdoor driving scenes,Signal to noise ratio 25~20dB"},{"@type":"PropertyValue","name":"Recording Scenes","value":"divide to big scenes and sub scenes by different intense of sunlight"},{"@type":"PropertyValue","name":"Recording Content","value":"Short signals and spoken sentences"},{"@type":"PropertyValue","name":"Speaker","value":"249 Chinese, balance for gender"},{"@type":"PropertyValue","name":"Recording Device","value":"Camera, HD microphone, Audio board"},{"@type":"PropertyValue","name":"Recording angle","value":"Recording videos of front face, single side face, looking up, looking down, side face looking down and side face looking up all 6 different angles, and proximal and distant audio at the same time"},{"@type":"PropertyValue","name":"Language","value":"Mandarin"},{"@type":"PropertyValue","name":"Application scenario","value":"Lip Language recognization"},{"@type":"PropertyValue","name":"Accuracy","value":"Accuracy of sentence should not below 95%"}]
{"id":996,"datatype":"1","titleimg":"https://res.datatang.com/asset/productNew/APY190322001.png?Expires=2007353662&OSSAccessKeyId=LTAI5tQwXnJZbubgVfVa1ep9&Signature=XGkfgwWK2%2BVQilyVQsmFD61PcHo%3D","type1":"165","type1str":null,"type2":"165","type2str":null,"dataname":"155 Hours - Lip Sync Multimodal Video Data","datazy":[{"title":"Format","value":"Video: mp4 format, 1,280*720, Audio: wav format, 16HZ, 16bit mono"},{"title":"Recording Environment","value":"Using quiet sunny room to stimulate daytime outdoor driving scenes,Signal to noise ratio 25~20dB"},{"title":"Recording Scenes","value":"divide to big scenes and sub scenes by different intense of sunlight"},{"title":"Recording Content","value":"Short signals and spoken sentences"},{"title":"Speaker","value":"249 Chinese, balance for gender"},{"title":"Recording Device","value":"Camera, HD microphone, Audio board"},{"title":"Recording angle","value":"Recording videos of front face, single side face, looking up, looking down, side face looking down and side face looking up all 6 different angles, and proximal and distant audio at the same time"},{"title":"Language","value":"Mandarin"},{"title":"Application scenario","value":"Lip Language recognization"},{"title":"Accuracy","value":"Accuracy of sentence should not below 95%"}],"datatag":"Lip Language,Multimodal,Mandarin,Reading,Mobile Phone,Video camera","technologydoc":null,"downurl":null,"datainfo":"250 people participated in recording voice and matching lip language videos, multi-device synchronous recording, accurate alignment through pulse signals with high accuracy. It can be used in the research of multi-modal learning algorithm in the field of speech and image.","standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":["249 people","25 provinces and cities","balanced in gender"],"samplePresentation":[["mp4","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY190322001_demo1715767200180/APY190322001/39-1_7.mp4?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=CXlfsrjdCwQkhc3n1EVtI%2Fpm8fE%3D","/data/apps/damp/temp/ziptemp/APY190322001_demo1715767200180/APY190322001/39-1_7.mp4",""],["mp4","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY190322001_demo1715767200180/APY190322001/18-1_6.mp4?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=Q6wG5SACXEx%2FwO8OAJz9mcmQ%2BEI%3D","/data/apps/damp/temp/ziptemp/APY190322001_demo1715767200180/APY190322001/18-1_6.mp4",""],["mp4","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY190322001_demo1715767200180/APY190322001/31-1_2.mp4?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=VLqukWwvgzvK1TYD8TaPyAvkwyo%3D","/data/apps/damp/temp/ziptemp/APY190322001_demo1715767200180/APY190322001/31-1_2.mp4",""]],"officialSummary":"Voice and matching lip language video filmed with 249 people by multi-devices simultaneously, aligned precisely by pulse signal, with high accuracy. It can be used in multi-modal learning algorithms research in speech and image fields.","dataexampl":"","datakeyword":["lip language video data"," Lip Sync Data","Multimodal Video Data","Video Data"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Language,Data Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"speechRec","BGimg":"brightSpot_audio","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"],"single":"no","firstList":[["mp4","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY190322001_demo1715767200180/APY190322001/5-1_4.mp4?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=2kWsbA%2FAu1%2Bb7WGo37pQLws%2F0tk%3D","/data/apps/damp/temp/ziptemp/APY190322001_demo1715767200180/APY190322001/5-1_4.mp4",""]]}
Voice and matching lip language video filmed with 249 people by multi-devices simultaneously, aligned precisely by pulse signal, with high accuracy. It can be used in multi-modal learning algorithms research in speech and image fields.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Format
Video: mp4 format, 1,280*720, Audio: wav format, 16HZ, 16bit mono
Recording Environment
Using quiet sunny room to stimulate daytime outdoor driving scenes,Signal to noise ratio 25~20dB
Recording Scenes
divide to big scenes and sub scenes by different intense of sunlight
Recording Content
Short signals and spoken sentences
Speaker
249 Chinese, balance for gender
Recording Device
Camera, HD microphone, Audio board
Recording angle
Recording videos of front face, single side face, looking up, looking down, side face looking down and side face looking up all 6 different angles, and proximal and distant audio at the same time