Embodied intelligence 101: IShowSpeed Dances with Advanced Robot in Shenzhen

From:Nexdata Date: 05/20/2025

A video of an energetic performance of IShowSpeed dancing with a lifelike robot has gone viral. The video shows the raw energy of IShowSpeed, while the robots show mechanical precision. Both are part of a growing belief in embodied intelligence. At the keypoint of this technology is embodied intelligence data, which enables machines to sense, move, and interact more like humans.

Embodied AI is a frontier field at the intersection of artificial intelligence and robotics. It emphasizes that intelligent agents can achieve autonomous learning and evolution through dynamic interaction between the body and the environment. Its core lies in the deep integration of perception, action and cognition. In layman's terms, it means having the wisdom of the human body. In a broad sense, anything with end-side intelligence + some functions of the human body can be called embodied intelligence, such as robots, assisted driving cars, etc.

To build an embodied intelligent robot, it is necessary to master a variety of data for self-training, including multimodal perception data (responsible for vision, hearing, itself and the surrounding environment), motion control data (responsible for motion trajectory, motion effect, force control and fault conditions), physical simulation data, multi-agent collaboration data, safety ethics data and knowledge base data.

Multimodal sensory data plays a vital role in embodied intelligence. Imagine a living person picking up a cup from the ground. This series of actions requires three movements: grasping, lifting, and holding. They need sensory capabilities to achieve this.In order to promote the development of embodied intelligence, the academic community has gradually opened up some representative data.

Meta AI released OpenEQA, an open-source benchmark dataset that aims to measure an artificial intelligence system’s capacity for “embodied question answering” - developing an understanding of the real world that allows it to answer natural language questions about an environment.

AgiBot World: launched by AgiBot, Shanghai AI Laboratory, etc., with 100+ robots, 100+ scenarios, and 1 million+ operation trajectories, including data modalities such as vision, touch, and motion trajectory.

ARIO (All Robots In One): launched by Pengcheng Laboratory, Southern University of Science and Technology, Sun Yat-sen University, etc., with more than 320,000 tasks, 258 scenes, and more than 3 million operation trajectories, including 2D/3D vision, touch, sound, text and other data modalities, and the data are both simulated and real.

Although open source data can support basic data needs, if you want to have more precise control over the model, you need more accurate data. Nexdata has customized data services and high-quality off-the-shelf data sets. The following are popular recommendations.

5,808 People - Human Pose Recognition Data

This dataset includes indoor and outdoor scenes.This dataset covers males and females. Age distribution ranges from teenager to the elderly, the middle-aged and young people are the majorities. The data diversity includes different shooting heights, different ages, different light conditions, different collecting environment, clothes in different seasons, multiple human poses. For each subject, the labels of gender, race, age, collecting environment and clothes were annotated. The data can be used for human pose recognition and other tasks.

558,870 Videos - 50 Types of Dynamic Gesture Recognition Data

The collecting scenes of this dataset include indoor scenes and outdoor scenes (natural scenery, street view, square, etc.). The data covers males and females. The age distribution ranges from teenager to senior. The data diversity includes multiple scenes, 50 types of dynamic gestures, 5 photographic angles, multiple light conditions, different photographic distances. This data can be used for dynamic gesture recognition of smart homes, audio equipments and on-board systems.

58,255 Images Object Detection Data in Construction Site Scenes

The collection scenes include indoor and outdoor scenes. The data includes Asians. The data includes multiple devices, multiple lighting conditions, multiple scenes and multiple collection time periods. The data can be used for tasks such as safety helmet, reflective vest and human body detection.

If you are interested in exploring more of Nexdata’s high-quality datasets, please visit our datasets catalog. Or welcome to leave a message to inquire your specific data needs.

Embodied intelligence 101: IShowSpeed Dances with Advanced Robot in Shenzhen

Recent

Case Study: COT-VLA Robotic Arm Annotation Project

Case Study: Indonesian Language Data Collection Project

Case Study: British Native Lip-Reading Multimodal Project

Previous

Join Nexdata MLC-SLM Workshop at Interspeech 2025

Next

How to Train Embodied AI That Works Everywhere: A Universal Dataset Blueprint