en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

570K Chinese LLM Content Safety Dataset

llm content safety dataset
ai content safety data
content safety training data
llm safety dataset
ai moderation dataset
harmful content dataset
llm alignment dataset

This dataset containing approximately 570,000 question–answer pairs. The data covers 31 established content safety categories (CAC) along with additional emerging risk categories. All samples are written by professional annotators, this dataset can be used for tasks such as large language model training, safety evaluation, and supervised fine-tuning focused on content moderation and risk handling.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Data content
Large Language Model content safety considerations text data
Data size
About 570,000 sets of question and answer data; covering 31 categories of CAC + other new categories
Collecting type
41 major categories
Collecting method
written by professional annotators
Storage format
Excel
Language
Chinese
Sample Sample
  • 570K Chinese LLM Content Safety Dataset
  • 570K Chinese LLM Content Safety Dataset
  • 570K Chinese LLM Content Safety Dataset
Recommended DatasetsRecommended Dataset
Tell Us Your Special Needs

Current Project Maturity

Early exploration (no concrete specs yet)
Defined goals, need professional guidance
Active development or optimization phase
Data & labeling experts with clear specifications

By submitting, I agree to the Privacy Protection

7458db07-fb73-415a-9868-ef423dd6327e

b3643f65-9639-49ef-8037-a8843b6571af