From:Nexdata Date: 2024-08-15
The quality and diversity of datasets determine the intelligence level of AI model. Whether it is used for smart security, autonomous driving, or human-machine interaction, the accuracy of datasets directly affect the performance of the model. With the development of data collection technology, all type of customized datasets are constantly being created to support the optimization of AI algorithm. Though in-depth research on these types of datasets, AI technology’s application prospects will be broader.
According to the “2021 China Intelligent Customer Service Market Report”, the market of China’s intelligent customer service industry will reach 3.01 billion Yuan in 2020, a year-on-year increase of 88.1%. It is expected that the intelligent customer service market may exceed 10 billion Yuan by 2025, showing a rapid growth trend.
Using NLP, ASR and other technologies, intelligent customer service can greatly improve text and language processing capabilities. It has outstanding advantages in access channels, response efficiency, data management and analysis, and improves work efficiency.
However, the problems of customers are all kinds of strange, and intelligent customer service robots are often helpless in the face of complex problems raised by customers. Both AI and ML applications are machines with their own limitations. They can only process based on the data in the system. When any query or communication exceeds their limited data, these tools can get stuck or give false and irrelevant answers. In addition, most intelligent robots on the market cannot read between the lines or fully understand the meaning of the context. There is still a long way to go for intelligent customer service to replace human customer service.
As a world’s leading AI data services provider, Nexdata has been committed to providing high-quality customer service data solutions to help empower the industry with technology and achieve the implementation of technology in more application scenarios.
20 Hours — American English Speech Synthesis Corpus-Male
Male audio data of American English. It is recorded by American English native speakers, with authentic accent. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
19.46 Hours — American English Speech Synthesis Corpus-Female
Female audio data of American English,. It is recorded by American English native speaker, with authentic accent and sweet sound. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
10.4 Hours — Japanese Synthesis Corpus-Female
10.4 Hours — Japanese Synthesis Corpus-Female. It is recorded by Japanese native speaker, with authentic accent. The phoneme coverage is balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
50 People — Chinese-English Mixed Average Tone Speech Synthesis Corpus-Customer Service
50 People — Chinese-English Mixed Average Tone Speech Synthesis Corpus-Customer Service. It is recorded by Chinese native speakers,customer service text, and the syllables, phonemes and tones are balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
2,520 Hours — Real-time Speech Assistant Mandarin Speech Data
This data is the customer consultation data of a well-known voice assistant in the real scene, and it is the actual consultation recording between customer service and customers. The valid time is 2,520 hours. The collection is carried out in queit indoor environment, including some noises that don’t affect the speech recognition. All texts are of high accuracy after manually transcribed and proofread by professional annotators.
800 Hours — English Real-time Speech Data of Typical-fields Customer Service
800 Hours — English Real-time Speech Data of Typical-fields Customer Service, collected from real scenes, recording real interactions between customer service staff and customers; it comes from customer service centers, and covers multiple fields. Text content, speaker’s identity and gender, sensitive information and other attributes are annotated.
317 Hours — Cantonese Real-time Speech Data of Real estate Customer Service
Cantonese customer service speech data with a duration of 317 hours, collected from real scenes, recording real interactions between customer service staff and customers; it comes from customer service centers. Text content, speaker’s identity and gender, sensitive information, and other attributes are annotated.
End
If you need data services, please feel free to contact us: info@nexdata.ai.
In the era of deep integration of data and artificial intelligence, the richness and quality of datasets will directly determine how far an AI technology goes. In the future, the effective use of data will drive innovation and bring more growth and value to all walks of life. With the help of automatic labeling tools, GAN or data augment technology, we can improve the efficiency of data annotation and reduce labor costs.