155 Hours - Lip Sync Multimodal Video Data

lip language video data

Lip Sync Data

Multimodal Video Data

Video Data

Voice and matching lip language video filmed with 249 people by multi-devices simultaneously, aligned precisely by pulse signal, with high accuracy. It can be used in multi-modal learning algorithms research in speech and image fields.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Specifications

Format

Video: mp4 format, 1,280*720, Audio: wav format, 16HZ, 16bit mono

Recording Environment

Using quiet sunny room to stimulate daytime outdoor driving scenes,Signal to noise ratio 25~20dB

Recording Scenes

divide to big scenes and sub scenes by different intense of sunlight

Recording Content

Short signals and spoken sentences

Speaker

249 Chinese, balance for gender

Recording Device

Camera, HD microphone, Audio board

Recording angle

Recording videos of front face, single side face, looking up, looking down, side face looking down and side face looking up all 6 different angles, and proximal and distant audio at the same time

Language

Mandarin

Application scenario

Lip Language recognization

Accuracy

Accuracy of sentence should not below 95%

Recommended Dataset

531 Hours - In-Car Noise Data by Microphone and Mobile Phone

531 hours of noise data in in-car scene. It contains various vehicle models, road types, vehicle speed and car windoe close/open condition. Six recording points are placed to record the noise situation at different positions in the vehicle and accurately match the vehicle noise modeling requirements.

Microphone handset collects vehicle noise data vehicle noise data collection vehicle noise vehicle noise data

245 Hours - Mandarin Chinese(China) In Car Scripted Monologue Smartphone speech dataset

Mandarin Chinese(China) In Car Scripted Monologue Smartphone speech dataset, collected from monologue based on given scripts, covering short message, news, 30+ customer consulting doamins, recorded in car. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers( 695 people), geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Car voice data mandarin voice data voice data scripted speech data