Case Study: Multi-View Data Collection Project

From:Nexdata Date: 12/24/2025

Project Background

Use 15 cameras to simultaneously record videos from different angles, and use RGB information to calculate depth information images.

Use a single industrial multi-view camera to collect 3 common outdoor scenes to assist in verifying the depth information estimation algorithm of images.

Collection Requirements

15 Planar Array Cameras

Collection Equipment: 15 RGB cameras (CANON R10) with a resolution of not less than 1920x1080.

Shooting Distance: 0.5m-3m.

Shooting Content:

Shooting Content	Number of Cameras	Camera Array	Distance from Scene to Camera Array	Additional Requirements
Indoor Meeting Scene	15	3*5 2D layout	3m~10m	Real meeting scene required
Small Pet Scene	15	3*5 2D layout	0.5m~3m	Real pet scene required
Outdoor Street Scene	15	3*5 2D layout	10m~30m	Outdoor environment required

Other Shooting Requirements:

01. The subjects should have smooth movements and natural expressions. The scene should be reasonably arranged with sufficient lighting and certain foreground-background occlusion relationships.

02. The details of the subjects and scene should be fully captured.

03. The maximum time error between multi-view data should be 33 milliseconds.

Delivery Content:

Deliver texture data and depth data for each video sequence.

Texture data format: YUV 4:2:0 10 bits, file naming format: "Content_SequenceNumber_#CameraNumber_texture.yuv".

Depth data format: YUV 4:2:0 16 bits, file naming format: "Content_SequenceNumber_#CameraNumber_depth.yuv", where all numbers should be zero-padded for alignment.

Provide Brown camera model parameters such as fx, fy, cx, cy, K1, K2, K3, p1, p2, which should be the results automatically calculated by Metashape. Use fixed focus shooting and indicate the order of all parameters.

Additional Key Points

The 15 planar array cameras should be placed at different angles and triggered simultaneously using hardware trigger lines to ensure synchronized shooting with an error within 33 milliseconds.

Provide Brown camera model parameters to complete calibration between multiple cameras and obtain external parameters for subsequent 3D space reconstruction.

For 2D camera shooting, since texture reconstruction will be performed later, the models' clothing should be distinctive and not limited to only white and black.

Reduce the number of green plants on the table as they may cause jagged edges during subsequent reconstruction.

Meeting Room Scene Requirements

At least 3 people are required in the meeting scene, with clear roles: a presenter (standing) and audience members (sitting). There should be spatial relationships between the presenter and audience to create reasonable occlusion for subsequent depth estimation verification.

Small Pet Scene Requirements

At least 1 pet is required, initially planned to be lying on a table. The scene should include multiple pieces of furniture (refer to the images in questions 3 and 4) with reasonable occlusion relationships.

Select pets with different fur textures (similar to requiring distinctive clothing for human models).

Outdoor Street Scene Requirements

At least 10 people are required, moving naturally within the scene.

Multi-view Camera

Collection Equipment: Plenoptic camera with resolution not less than 2448*2048 Hikvision MV-CS040-A0UC-rgb planar array camera with resolution 2048*2048 and auto-exposure

Shooting Distance: 2-3m (outdoor scenes)

Shooting Content: People interacting naturally with outdoor objects and environment, such as walking, cycling, playing basketball, table tennis, etc.

Other Shooting Requirements:

01.When shooting dynamic scenes, the camera position and perspective should remain fixed.

02.Use uniform diffuse lighting in outdoor scenes, avoid direct sunlight, and ensure consistent lighting throughout the shooting process.

03.Outdoor scenes include streets and tree-lined paths (no other dynamic people or objects except those specified).

Delivery Content: Include 3 outdoor scenes, with 1 video segment of not less than 10 seconds per scene, at a frame rate of not less than 30fps, totaling 3*300=900 items.

Video sequences should be saved in png/point cloud/depth map format (multiple output formats supported by plenoptic camera).

Additional Key Points:

Multi-camera Array: Multiple cameras (such as a 3×5 array) shoot simultaneously from different viewpoints to directly obtain multi-angle light information (more suitable for dynamic scenes).

Since the plenoptic camera cannot assign color information, we collaborated with the manufacturer to reverse-engineer the shooting principle of the plenoptic camera and added an RGB camera to provide color assignment.

Summary

This project combines multi-view camera array technology with plenoptic camera technology to build a 3D data collection and reconstruction system covering indoor and outdoor scenes, static and dynamic interactions, providing high-precision data support for depth estimation algorithm verification and virtual-real fusion content generation.

Data is the key to the success of artificial intelligence. We must strengthen data collection methods and data security to achieve more intelligent and efficient technical solutions. In a rapidly developing market, only by continuous innovate and optimize of artificial intelligence can we build a safer, more efficient and intelligent society. If you have data requirements, please contact Nexdata.ai at [email protected].

Case Study: Multi-View Data Collection Project

Recent

Nexdata Announces Full Operation of World-Leading Embodied Intelligence Data Factory

Case Study: Multi-View Data Collection Project

Case Study: COT-VLA Robotic Arm Annotation Project

Previous

Case Study: COT-VLA Robotic Arm Annotation Project

Next

Nexdata Announces Full Operation of World-Leading Embodied Intelligence Data Factory