Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

Home > All Category Datasets > NLU Datasets > Traditional Chinese SMS Corpus – 10 Million Conversational Texts

Traditional Chinese SMS Corpus – 10 Million Conversational Texts

Traditional Chinese SMS corpus

NLP Text Dataset

Traditional Chinese dataset

Chinese NLP dataset

Chinese text corpus

NLU training data

SMS corpus

This dataset is a large-scale Traditional Chinese conversational text dataset consisting of 10 million real-world SMS messages written in spoken-style Traditional Chinese. All content is provided in plain text (TXT) format, the dataset is well suited for training and evaluating large language models, dialogue systems, Chinese conversational text analysis and related tasks.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Specifications

Data content

Traditional Chinese SMS corpus text data

Data size

10 million

Collecting period

The year 2,014

Storage format

txt

Language

Chinese

Sample

Recommended Dataset

82 Million Cantonese Script Data

Cantonese textual data, 82 million pieces in total; data is collected from Cantonese script text; data set can be used for natural language understanding, knowledge base construction and other tasks.

Cantonese Script

Tell Us Your Special Needs

Current Project Maturity

Early exploration (no concrete specs yet)

Defined goals, need professional guidance

Active development or optimization phase

Data & labeling experts with clear specifications

Full Name *

Contact Phone No.*

Company name *

Company Email *

Data Requirements *

By submitting, I agree to the Privacy Protection

Submit

Subscribe to our newsletter

Be the first to receive Nexdata latest product releases, data solutions and enterprise news.

Off-the-Shelf Datasets: All Category Datasets; LLM Datasets; Computer Vision Datasets; Speech Recognition Datasets; Speech Synthesis Datasets; OCR Datasets; Pronunciation Dictionary; NLU Datasets

Data Service: 3D Point Cloud Data; Street View Data; OCR Data; Behavior Recognition Data; Identity Recognition Data; Speech Recognition Data; Speech Synthesis Data; Multimodal Data

Industries: Embodied AI; Generative AI; Autonomous Vehicles; AR/VR; Conversational AI; Smart Home; Retail; Intelligent Healthcare

Company: About Us; News; Partners; Quality & Security; Event
Links: OPENMPD; DataPlus; Datarade

Platform: Platform
Competition: Competition
Resources: Sponsored Datasets

Sharpen Your AI with Better Data

+1(626)594-5598

[email protected]

Sitemap Terms and Conditions

We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

08dde73e-2f05-42d5-a0f7-cb10b5208313

78df2571-86c0-4ef9-9de2-781d6b84b7d6

Traditional Chinese SMS Corpus – 10 Million Conversational Texts

Traditional Chinese SMS corpus NLP Text Dataset Traditional Chinese dataset Chinese NLP dataset Chinese text corpus NLU training data SMS corpus

Current Project Maturity

Traditional Chinese SMS corpus

NLP Text Dataset

Traditional Chinese dataset

Chinese NLP dataset

Chinese text corpus

NLU training data

SMS corpus