en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

Traditional Chinese SMS Corpus – 10 Million Conversational Texts

Traditional Chinese SMS corpus
NLP Text Dataset
Traditional Chinese dataset
Chinese NLP dataset
Chinese text corpus
NLU training data
SMS corpus

This dataset is a large-scale Traditional Chinese conversational text dataset consisting of 10 million real-world SMS messages written in spoken-style Traditional Chinese. All content is provided in plain text (TXT) format, the dataset is well suited for training and evaluating large language models, dialogue systems, Chinese conversational text analysis and related tasks.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Data content
Traditional Chinese SMS corpus text data
Data size
10 million
Collecting period
The year 2,014
Storage format
txt
Language
Chinese
Sample Sample
  • Traditional Chinese SMS Corpus – 10 Million Conversational Texts
Recommended DatasetsRecommended Dataset
Tell Us Your Special Needs

Current Project Maturity

Early exploration (no concrete specs yet)
Defined goals, need professional guidance
Active development or optimization phase
Data & labeling experts with clear specifications

By submitting, I agree to the Privacy Protection

ece90c7d-7d75-4c41-bf16-5c5e000ff83e

7bb18fb1-3299-4c70-b831-e4ce74df2b70