en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

Case Study: British Native Lip-Reading Multimodal Project

From:Nexdata Date: 10/30/2025

The challenge seemed straightforward: capture 1080p/30fps lip movements of 300 Britons with balanced gender representation and regional accents. But behind the final MOV format output lay many complex aspects, including technical calibration, cultural sensitivity, and process standardizationthe very core of this groundbreaking lip-reading multimodal project.


Project Overview

This multimodal dataset initiative set out to create high-fidelity lip-reading resources by recording 300 British natives (50% male, 50% female) across key linguistic regions: London, Ireland, Scotland, and Wales. Each participant contributed video footage in 1080p resolution at 30fps, delivered in MOV format with a strict word accuracy requirement of ≥95%. The geographic spread was intentionally designed to capture the linguistic diversity of the British Isles, from Received Pronunciation to regional dialects.


Execution Framework

The project employed a four-phase operational model:

Recruitment: Based on a 1:1 gender ratio and regional coverage, and considering the impact of recording equipment on video quality, we limited recruitment to UK residents with Apple devices, recruiting qualified personnel through various effective channels.

Training: We distributed Apple mobile phone video recording guidelines to participants and provided relevant training, detailing the basic requirements and precautions for recording.

Supervision: Throughout the process, we assigned dedicated personnel to provide guidance and supervision, ensuring the recording work proceeded smoothly and according to plan.

Delivery: After recording, we organized personnel to export the data from the phones and upload it to Google Drive according to the data upload guidelines, followed by data processing.

This standardized approach ensured that technical variables were minimized while preserving the natural linguistic characteristics of participants.


Critical Challenges & Solutions

1.Corpus Distribution

Challenge: Uneven text length across recording scripts
Solution: Algorithmic word-count balancing that divided materials into equal linguistic units rather than simple file splits, ensuring consistent workload distribution.

2.Device Calibration

Challenge: Inconsistent video parameters across different devices
Solution: Developed device-specific configuration guides with step-by-step screenshots for Apple models to lock the resolution and frame rate.

3. Data Pipeline Security 

Challenge: Large file transfer reliability
Solution: We developed a detailed data upload guide, including screenshots of the steps to export data from a mobile phone and upload it to Google Drive. Personnel were required to strictly follow the guide, exporting the data from their phones first, and then uploading it to Google Drive. Staff supervised and guided the upload process, and data quality was checked promptly after upload.

4. Recording Errors and Stuttering

Challenge: Natural speech capture resulted in numerous errors and stuttering, impacting video quality and increasing post-editing workload
Solution: Implemented graduated familiarization periods where participants reviewed texts for 15 minutes pre-recording, balancing accuracy with authentic delivery.


Key Learnings

Three foundational insights emerged from the project:

Strategic Planning: Geographic accent mapping during pre-production reduced regional bias by 40%;

Resource Standardization: Device-specific configuration templates cut setup time by 65%;

Communication Protocols: Daily check-in mechanisms identified 83% of potential quality issues before final recording;

These operational refinements not only ensured project delivery within the 95% accuracy benchmark but established a replicable framework for future multimodal linguistic data collection.


Conclusion

This lip-reading dataset project demonstrates how technical rigor and cultural sensitivity can coexist in multimodal data collection. By prioritizing regional linguistic diversity alongside technical precision, the initiative delivered a resource that balances scientific utility with authentic human expression.


Nexdata offers customized data service, if you have any requirements about data collection/ data annotation, please dont hesitate to reach out to us at [email protected].

Nexdata会社情報・AI開発に役立つ事例・業界レポートをダウンロードできます。

今すぐチェック
a6b9bd5d-bef4-41cf-a896-be409e7f4dcb