What’s AI-powered Virtual Human

From：Nexdata Date： 2024-08-15

➤ Digital Virtual Human in China

The rapid development of artificial intelligence is inseparable from the support of high-quality data. Data is not only the fuel that drives the progress of AI model learning, but also the core factor to improve model performance, accuracy and stability. Especially in the field of automatic tasks and intelligent decision-making, deep learning algorithms based on massive data have shown their potential. Therefore, having well-structured and rich datasets has become a top priority for engineers and developers to ensure that AI systems can perform well in a variety of different scenarios.

According to the “Digital Virtual Human Depth Industry Report”, by 2030, the overall market size of China’s digital virtual human will reach 270 billion.

The digital virtual human has the appearance of a human being, and even the fineness of the skin is close to that of a real person. It has human behavior and can be expressed through language, facial expressions or body movements; it has human thoughts and can interact with human beings in real time, which is almost the same as human beings.

The mainstream technology-driven routes of virtual digital humans are divided into AI-driven and human-driven digital human. Human-driven digital people are driven by real people. The main principle is that the real person communicates with the user in real time according to the user video sent by the video surveillance system, and at the same time, the expression and action of the real person are presented on the virtual digital human image through the motion capture collection system, so as to interact with the user.

The AI-driven digital human can automatically read, analyze and recognize the external input information through the intelligent system, decide the subsequent output text of the digital human according to the analysis results, and then drive the character model to generate corresponding voices and actions to make the digital human interact with the user. The character model is pre-trained by AI technology, and can generate voice and corresponding animation through text-driven. This model is called TTSA (Text To Speech & Animation) character model in the industry.

Due to different technical routes, its application scenarios are also different. Artificial intelligence-driven virtual human is used in news broadcast, customer service, explanation and other scenarios, while motion capture-driven virtual human is suitable for MCN agency marketing, live broadcast, virtual anchor and other highly interactive scenarios.

➤ Datasets for speech, vision, NLP

Nexdata AI-powered Virtual Person Data Solutions

As a world’s leading AI data services provider, Nexdata has more than 500 off-the-shelf AI datasets covering computer vision, ASR, TTS and NLP. We help our customers develop more interactive virtual digital humans through high-quality data services.

TTS Datasets

● Chinese Mandarin Average Tone Speech Synthesis Corpus, General

100 People — Chinese Mandarin Average Tone Speech Synthesis Corpus, General. It is recorded by Chinese native speaker. It covers news, dialogue, audio books, poetry, advertising, news broadcasting, entertainment; and the phonemes and tones are balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

● American English Speech Synthesis Corpus-Female

Female audio data of American English,. It is recorded by American English native speaker, with authentic accent and sweet sound. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

● Japanese Synthesis Corpus-Female

10.4 Hours — Japanese Synthesis Corpus-Female. It is recorded by Japanese native speaker, with authentic accent. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

● Chinese Mandarin Synthesis Corpus-Female, Emotional

The 13.3 Hours — Chinese Mandarin Synthesis Corpus-Female, Emotional. It is recorded by Chinese native speaker, emotional text, and the syllables, phonemes and tones are balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

2. Computer Vison Datasets

● Multi-pose and Multi-expression Face Data

1,507 People 102,476 Images Multi-pose and Multi-expression Face Data. The data includes 1,507 Asians (762 males, 745 females). For each subject, 62 multi-pose face images and 6 multi-expression face images were collected. The data diversity includes multiple angles, multiple poses and multple light conditions image data from all ages. This data can be used for tasks such as face recognition and facial expression recognition.

● 3D Instance Segmentation and 22 Landmarks Annotation Data of Human Body

18,880 Images of 466 People — 3D Instance Segmentation and 22 Landmarks Annotation Data of Human Body. The dataset diversity includes multiple scenes, light conditions, ages, shooting angles, and poses. In terms of annotation, we adpoted instance segmentation annotations on human body. 22 landmarks were also annotated for each human body. The dataset can be used for tasks such as human body instance segmentation and human behavior recognition.

● Human Pose Recognition Data

10,000 People — Human Pose Recognition Data. This dataset includes indoor and outdoor scenes.This dataset covers males and females. Age distribution ranges from teenager to the elderly, the middle-aged and young people are the majorities. The data diversity includes different shooting heights, different ages, different light conditions, different collecting environment, clothes in different seasons, multiple human poses. For each subject, the labels of gender, race, age, collecting environment and clothes were annotated. The data can be used for human pose recognition and other tasks.

● 18_Gestures Recognition Data

314,178 Images 18_Gestures Recognition Data. This data diversity includes multiple scenes, 18 gestures, 5 shooting angels, multiple ages and multiple light conditions. For annotation, gesture 21 landmarks (each landmark includes the attribute of visible and visible), gesture type and gesture attributes were annotated. This data can be used for tasks such as gesture recognition and human-machine interaction.

➤ Parallel corpuses for data analysis

3. NLP Datasets

● Chinese-English Parallel Corpus Data

3,060,000 sets of parallel translation corpus between Chinese and English. It is stored in txt files. It covers files like travel, medicine, daily and TV play. Data cleaning, desensitization, and quality inspection have been carried out. It can be used as the basic corpus database in text data file as well as used in machine translation.

● Japanese-English Parallel Corpus Data

Japanese and English parallel corpus, 380,000 groups in total; excluded political, porn, personal information and other sensitive vocabulary; it can be a base corpus for text-based data analysis, used in machine translation and other fields.

● English-Korean Parallel Corpus Data

English and Korean parallel corpus, 1340,000 groups in total; excluded political, porn, personal information and other sensitive vocabulary; it can be a base corpus for text-based data analysis, used in machine translation and other fields.

●English-Russian Parallel Corpus Data

English and Russian parallel corpus, 1,080,000 groups in total; excluded political, porn, personal information and other sensitive vocabulary; it can be a base corpus for text-based data analysis, used in machine translation and other fields.

End

If you need data services, please feel free to contact us at info@nexdata.com.

On the road to intelligent future, data will always be an indispensable driving force. The continuous expanding and optimizing of all kinds of datasets will provide a broader application space for AI algorithms. By constant exploring new data collection and annotation methods, all industries can better handle complex application scenarios. If you have data requirements, please contact Nexdata.ai at [email protected].

What’s AI-powered Virtual Human

Recent

How to Train Embodied AI That Works Everywhere: A Universal Dataset Blueprint

Embodied intelligence 101: IShowSpeed Dances with Advanced Robot in Shenzhen

Join Nexdata MLC-SLM Workshop at Interspeech 2025

Previous

Nexdata Launches New Parallel Corpora for Machine Translation

Next

10 Open Source Speech Datasets