en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

What’s AI-powered Virtual Human

From:Nexdata Date: 2024-04-03

According to the “Digital Virtual Human Depth Industry Report”, by 2030, the overall market size of China’s digital virtual human will reach 270 billion.

The digital virtual human has the appearance of a human being, and even the fineness of the skin is close to that of a real person. It has human behavior and can be expressed through language, facial expressions or body movements; it has human thoughts and can interact with human beings in real time, which is almost the same as human beings.

The mainstream technology-driven routes of virtual digital humans are divided into AI-driven and human-driven digital human. Human-driven digital people are driven by real people. The main principle is that the real person communicates with the user in real time according to the user video sent by the video surveillance system, and at the same time, the expression and action of the real person are presented on the virtual digital human image through the motion capture collection system, so as to interact with the user.

The AI-driven digital human can automatically read, analyze and recognize the external input information through the intelligent system, decide the subsequent output text of the digital human according to the analysis results, and then drive the character model to generate corresponding voices and actions to make the digital human interact with the user. The character model is pre-trained by AI technology, and can generate voice and corresponding animation through text-driven. This model is called TTSA (Text To Speech & Animation) character model in the industry.

Due to different technical routes, its application scenarios are also different. Artificial intelligence-driven virtual human is used in news broadcast, customer service, explanation and other scenarios, while motion capture-driven virtual human is suitable for MCN agency marketing, live broadcast, virtual anchor and other highly interactive scenarios.

Nexdata AI-powered Virtual Person Data Solutions

As a world’s leading AI data services provider, Nexdata has more than 500 off-the-shelf AI datasets covering computer vision, ASR, TTS and NLP. We help our customers develop more interactive virtual digital humans through high-quality data services.

  1. TTS Datasets

Chinese Mandarin Average Tone Speech Synthesis Corpus, General

100 People — Chinese Mandarin Average Tone Speech Synthesis Corpus, General. It is recorded by Chinese native speaker. It covers news, dialogue, audio books, poetry, advertising, news broadcasting, entertainment; and the phonemes and tones are balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

American English Speech Synthesis Corpus-Female

Female audio data of American English,. It is recorded by American English native speaker, with authentic accent and sweet sound. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

Japanese Synthesis Corpus-Female

10.4 Hours — Japanese Synthesis Corpus-Female. It is recorded by Japanese native speaker, with authentic accent. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

Chinese Mandarin Synthesis Corpus-Female, Emotional

The 13.3 Hours — Chinese Mandarin Synthesis Corpus-Female, Emotional. It is recorded by Chinese native speaker, emotional text, and the syllables, phonemes and tones are balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.

2. Computer Vison Datasets

Multi-pose and Multi-expression Face Data

1,507 People 102,476 Images Multi-pose and Multi-expression Face Data. The data includes 1,507 Asians (762 males, 745 females). For each subject, 62 multi-pose face images and 6 multi-expression face images were collected. The data diversity includes multiple angles, multiple poses and multple light conditions image data from all ages. This data can be used for tasks such as face recognition and facial expression recognition.

3D Instance Segmentation and 22 Landmarks Annotation Data of Human Body

18,880 Images of 466 People — 3D Instance Segmentation and 22 Landmarks Annotation Data of Human Body. The dataset diversity includes multiple scenes, light conditions, ages, shooting angles, and poses. In terms of annotation, we adpoted instance segmentation annotations on human body. 22 landmarks were also annotated for each human body. The dataset can be used for tasks such as human body instance segmentation and human behavior recognition.

Human Pose Recognition Data

10,000 People — Human Pose Recognition Data. This dataset includes indoor and outdoor scenes.This dataset covers males and females. Age distribution ranges from teenager to the elderly, the middle-aged and young people are the majorities. The data diversity includes different shooting heights, different ages, different light conditions, different collecting environment, clothes in different seasons, multiple human poses. For each subject, the labels of gender, race, age, collecting environment and clothes were annotated. The data can be used for human pose recognition and other tasks.

18_Gestures Recognition Data

314,178 Images 18_Gestures Recognition Data. This data diversity includes multiple scenes, 18 gestures, 5 shooting angels, multiple ages and multiple light conditions. For annotation, gesture 21 landmarks (each landmark includes the attribute of visible and visible), gesture type and gesture attributes were annotated. This data can be used for tasks such as gesture recognition and human-machine interaction.

3. NLP Datasets

Chinese-English Parallel Corpus Data

3,060,000 sets of parallel translation corpus between Chinese and English. It is stored in txt files. It covers files like travel, medicine, daily and TV play. Data cleaning, desensitization, and quality inspection have been carried out. It can be used as the basic corpus database in text data file as well as used in machine translation.

Japanese-English Parallel Corpus Data

Japanese and English parallel corpus, 380,000 groups in total; excluded political, porn, personal information and other sensitive vocabulary; it can be a base corpus for text-based data analysis, used in machine translation and other fields.

English-Korean Parallel Corpus Data

English and Korean parallel corpus, 1340,000 groups in total; excluded political, porn, personal information and other sensitive vocabulary; it can be a base corpus for text-based data analysis, used in machine translation and other fields.

English-Russian Parallel Corpus Data

English and Russian parallel corpus, 1,080,000 groups in total; excluded political, porn, personal information and other sensitive vocabulary; it can be a base corpus for text-based data analysis, used in machine translation and other fields.

End

If you need data services, please feel free to contact us at info@nexdata.com.

4a1a3c1f-47dc-4302-b614-502bc8a08d2e