Case Study: COT-VLA Robotic Arm Annotation Project

From:Nexdata Date: 12/04/2025

In 2025, our team tackled this problem through the Emboided AI COT-VLA project.This annotation task is a multi-task annotation task for robotic arm operation scenarios, consisting of three tasks: image trajectory annotation, image point annotation, and multi-view object annotation. The language requirement is English. The aim is to use manually annotated data for model training to optimize the accuracy and performance of the embodied manipulation model.

Core Annotation Tasks
Image Trajectory Annotation
The data item contains one image and five candidate questions. The annotator needs to choose one question from the five candidates that is closely related to the image, accurately described, and doesn't present the question too directly. The annotator needs to first review the five candidate questions and select the best one, then accurately and evenly mark the trajectory points on the left-hand image according to the question description to ensure that the robotic arm's movement matches the text description.

Point Annotation
The data item contains one image and five candidate questions. The annotator needs to choose one question from the five candidates that is closely related to the image, accurately describes it, and doesn't examine it from a too-simplistic perspective. The annotator needs to label one or more points on the image based on the selected question.

Multi-View Annotation
The data item contains two images from different perspectives, Viewpoint 1 and Viewpoint 2. Each image also contains multiple pairs of object points and their names. Each pair of object points corresponds to the same object in different perspectives. When labeling, the position of the labeled point must correspond to the object's position in both perspectives.
1) Label only if the object appears in both perspectives, focusing primarily on items on the table;
2) If there are many pairs of objects, label up to 5 pairs. If there are sufficient types of objects, avoid labeling the same type of object repeatedly;
3) Item names must be in English. If the item name cannot be broken down, a description of its appearance can be used.

Implementation Challenges
Because the client process lacks strict data isolation and locking at each stage, other annotators can still access and modify data even when it's being annotated or completed. To avoid conflicts of interest between teams, tasks are assigned based on team manpower and capabilities, and each task can only be executed by one team. Therefore, we adopted an account isolation system. After annotators complete their work, the team's quality control checks the data. To ensure complete and thorough data checks and facilitate screening to confirm whether data has been quality checked.

Results and Experience
Due to the client's tight development timeline, to avoid workload loss during supplier quality control, tasks were created in batches. However, considering the supplier's manpower, timeline, and data volume, executing tasks by a single team was deemed necessary to meet the client's expectations. Therefore, each team executed a single task. If multiple annotations were found during quality control, the data from multiple annotations needed to be cleaned up, retaining only the most correct one.

Conclusion
The COT-VLA project demonstrates how structured annotation transforms AI concepts into practical robotic capabilities. By focusing on clear guidelines, solving data conflicts, and maintaining strict quality control, we created a scalable framework applicable to various industrial robots. Practice has shown that Nexdata is capable of meeting customer needs on time and with high quality. If you have data collection/annotation requirements, please contact Nexdata.ai at [email protected]. We look forward to supporting your AI projects.

Case Study: COT-VLA Robotic Arm Annotation Project

Recent

Nexdata Announces Full Operation of World-Leading Embodied Intelligence Data Factory

Case Study: Multi-View Data Collection Project

Case Study: COT-VLA Robotic Arm Annotation Project

Previous

Case Study: Indonesian Language Data Collection Project

Next

Case Study: Multi-View Data Collection Project