en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

202 People Lip Reading Multimodal Video Dataset – Multi-Angle Mouth Movements

lip reading dataset
multimodal video data
Mandarin lip dataset
multi-angle lip video
mouth movement dataset
talking face dataset
visual speech recognition
audio-visual speech dataset
lip sync video corpus
AVSR training data

This dataset features 202 participants recorded in 13 distinct angles under both natural indoor lighting and fluorescent settings, using smartphones. It includes high-resolution multimodal video of Mandarin Chinese speech with diverse speaker demographics and natural lip movements. The dataset captures general content in an unconstrained format and is suitable for tasks such as lip reading, audio-visual speech recognition (AVSR), visual speech synthesis, lip-sync modeling, and other multimodal machine learning applications. It complies with GDPR, CCPA, and PIPL privacy regulations and has been validated by leading AI enterprises.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Data size
202 people, each person collects the audio and video data from 13 different angles +1 txt document
People distribution
race distribution: Asian (Indonesia), gender distribution: 89 males, 113 females, age distribution: 165 people aged 18-30, 32 people aged 31-45, and 5 people aged 46-60
Collecting environment
indoor natural light scenes, indoor fluorescent lamp scenes
Data diversity
including multiple scenes, different ages, different shooting angles
Device
cellphone, the resolution is 1,920*1,080
Collecting angle
audio and video data of front face, 3 angles left side face, 3 angles right side face, looking down, looking up, left side face down, right side face down, left side face up and right side face up all 13 different angles were collected at the same time
Recording content
general field, unlimited content
Language
Mandarin Chinese, each video is more than 20 seconds
Data format
the video data format is .mp4, the audio is greater than or equal to 16KHz, 16bit, the frame rate is 25-30 fps
Accuracy rata
the accuracy rate of word is more than 95%
Sample Sample
  • 202 People Lip Reading Multimodal Video Dataset – Multi-Angle Mouth Movements
  • 202 People Lip Reading Multimodal Video Dataset – Multi-Angle Mouth Movements
  • 202 People Lip Reading Multimodal Video Dataset – Multi-Angle Mouth Movements
Recommended DatasetsRecommended Dataset
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

9043bc61-09e7-45db-a6e5-e51f03e69c46

64665cf1-7c04-4ff0-ae86-7b9ff40df550