Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

Home > All Category Datasets > Speech Recognition Datasets > 155 Hours – Lip Sync Multimodal Video Data

155 Hours – Lip Sync Multimodal Video Data

Lip Language

Multimodal

Mandarin

Reading

Mobile Phone

Video camera

Voice and matching lip language video filmed with 249 people by multi-devices simultaneously, aligned precisely by pulse signal, with high accuracy. It can be used in multi-modal learning algorithms research in speech and image fields.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Specifications

Format

Video: mp4 format, 1,280*720, Audio: wav format, 16HZ, 16bit mono

Recording Environment

Using quiet sunny room to stimulate daytime outdoor driving scenes,Signal to noise ratio 25~20dB

Recording Scenes

divide to big scenes and sub scenes by different intense of sunlight

Recording Content

Short signals and spoken sentences

Speaker

249 Chinese, balance for gender

Recording Device

Camera, HD microphone, Audio board

Recording angle

Recording videos of front face, single side face, looking up, looking down, side face looking down and side face looking up all 6 different angles, and proximal and distant audio at the same time

Language

Mandarin

Application scenario

Lip Language recognization

Accuracy

Accuracy of sentence should not below 95%

Sample

Recommended Dataset

In-Car Noise Dataset – 531 Hours of Cabin Recordings

This dataset contains 531 hours of in-car ambient noise recordings captured using microphones and mobile phones across various vehicle models, road types, speeds, and cabin conditions such as windows open or closed. The noise was recorded at six distinct points inside each vehicle to reflect spatial diversity and better support vehicle sound modeling. The dataset captures real-world driving environments, engine hums, road interactions, wind noise, and cabin reverberation. It is ideal for use cases such as noise suppression, automatic speech recognition (ASR) in cars, in-vehicle audio enhancement, and sound source separation. Validated by leading AI companies, the dataset complies fully with global data privacy regulations including GDPR, CCPA, and PIPL, making it suitable for both research and commercial applications.

in-car noise dataset vehicle interior sound car ambient noise automotive audio dataset cabin noise data car sound modeling speech enhancement training data vehicle noise cancellation dataset

Tell Us Your Special Needs

Current Project Maturity

Early exploration (no concrete specs yet)

Defined goals, need professional guidance

Active development or optimization phase

Data & labeling experts with clear specifications

Full Name *

Contact Phone No.*

Company name *

Company Email *

Data Requirements *

By submitting, I agree to the Privacy Protection

Submit

Subscribe to our newsletter

Be the first to receive Nexdata latest product releases, data solutions and enterprise news.

Off-the-Shelf Datasets: All Category Datasets; LLM Datasets; Computer Vision Datasets; Speech Recognition Datasets; Speech Synthesis Datasets; OCR Datasets; Pronunciation Dictionary; NLU Datasets

Data Service: 3D Point Cloud Data; Street View Data; OCR Data; Behavior Recognition Data; Identity Recognition Data; Speech Recognition Data; Speech Synthesis Data; Multimodal Data

Industries: Embodied AI; Generative AI; Autonomous Vehicles; AR/VR; Conversational AI; Smart Home; Retail; Intelligent Healthcare

Company: About Us; News; Partners; Quality & Security; Event
Links: OPENMPD; DataPlus; Datarade

Platform: Platform
Competition: Competition
Resources: Sponsored Datasets

Sharpen Your AI with Better Data

+1(626)594-5598

[email protected]

Sitemap Terms and Conditions

We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

c9999cdc-7f95-4d9a-9767-3991cf0b91b9

0663f2a8-572e-4815-a8af-06609be854f5

155 Hours – Lip Sync Multimodal Video Data

Lip Language Multimodal Mandarin Reading Mobile Phone Video camera

Voice and matching lip language video filmed with 249 people by multi-devices simultaneously, aligned precisely by pulse signal, with high accuracy. It can be used in multi-modal learning algorithms research in speech and image fields.

Current Project Maturity

Lip Language

Multimodal

Mandarin

Reading

Mobile Phone

Video camera