en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

82 Million Cantonese Script Data

Cantonese script data
Cantonese textual data
Cantonese text data collection
dialogue text data

Cantonese textual data, 82 million pieces in total; data is collected from Cantonese script text; data set can be used for natural language understanding, knowledge base construction and other tasks.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Data content
82 million Cantonese script texts
Data size
82 million Cantonese script texts
Collecting period
The year 2,015
Storage format
txt
Language
Cantonese
Sample Sample
  • 82 Million Cantonese Script Data
Recommended DatasetsRecommended Dataset
830,276 groups - Multi-Round Interpersonal Dialogues Text Data

This database is the interactive text corpus of real users on the mobile phone. The database itself has been desensitized to ensure of no private information of the user's (A and B are the codes to replace the sender and receiver, and sensitive information such as cellphone number and user name are replaced with '* * *'). This database can be used for tasks such as natural language understanding.

Interactive text corpus database text corpus database
10 Million Traditional Chinese Oral Message Data

Traditional Chinese SMS corpus, 10 million in total, real traditional Chinese spoken language text data; only contains text messages; the content is stored in txt format; the data set can be used for natural language understanding and related tasks.

Traditional Chinese SMS corpus traditional Chinese SMS data traditional Chinese SMS collection traditional Chinese corpus data
13,000,000 Groups – Man-Machine Conversation Interactive Text Data

Human-machine dialogue interaction textual data, 13 million groups in total. The data is interaction text between the user and the robot. Each line represents a set of interaction text, separated by '|'; this data set can be used for natural language understanding, knowledge base construction etc.

textual data of human-machine dialogue interaction human-machine dialogue text human-machine dialogue data dialogue text data
203,029 Groups - Chinese Medical Question Answering Data

The data contains 203,029 groups Chinese question answering data between doctors and patients of different diseases.

Medical question answering disease
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

6b3438b4-81d9-44b6-b5bf-88b43964f490

89f9d3d0-5399-4337-928b-fb013a800241