[{"@type":"PropertyValue","name":"Data content","value":"Large Language Model content safety considerations text data"},{"@type":"PropertyValue","name":"Data size","value":"About 570,000 sets of question and answer data; covering 31 categories of CAC + other new categories"},{"@type":"PropertyValue","name":"Collecting type","value":"41 major categories"},{"@type":"PropertyValue","name":"Collecting method","value":"written by professional annotators"},{"@type":"PropertyValue","name":"Storage format","value":"Excel"},{"@type":"PropertyValue","name":"Language","value":"Chinese"}]
{"id":1349,"datatype":"1","titleimg":"https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/asset/productNew/nexdata/APY231130003.jpg?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=%2FI2D2Uu3IrX7QWPTf3nWBLoDXeo%3D","type1":"226","type1str":null,"type2":"228","type2str":null,"dataname":"570K Chinese LLM Content Safety Dataset","datazy":[{"title":"Data content","content":"Large Language Model content safety considerations text data","desc":"Data content"},{"title":"Data size","content":"About 570,000 sets of question and answer data; covering 31 categories of CAC + other new categories","desc":"Data size"},{"title":"Collecting type","content":"41 major categories","desc":"Collecting type"},{"title":"Collecting method","content":"written by professional annotators","desc":"Collecting method"},{"title":"Storage format","content":"Excel","desc":"Storage format"},{"title":"Language","content":"Chinese","desc":"Language"}],"datatag":"Content safety,Text,LLM","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":"","samplePresentation":[{"name":"1.png","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20250728171244/1.png?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=lnfxkT02wujIvCXVtMn1z%2FLf1FU%3D","intro":"","size":76405,"progress":100,"type":"jpg"},{"name":"2.png","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20250728171244/2.png?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=V%2B9qZ5uTxJ0SrEHqswGTY7v6HBo%3D","intro":"","size":78372,"progress":100,"type":"jpg"},{"name":"3.png","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20250728171244/3.png?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=TGcBdIpdyEbp87EwlBGD6pwZvMA%3D","intro":"","size":72724,"progress":100,"type":"jpg"}],"officialSummary":"This dataset containing approximately 570,000 question–answer pairs. The data covers 31 established content safety categories (CAC) along with additional emerging risk categories. All samples are written by professional annotators, this dataset can be used for tasks such as large language model training, safety evaluation, and supervised fine-tuning focused on content moderation and risk handling.","dataexampl":null,"datakeyword":["llm content safety dataset","ai content safety data","content safety training data","llm safety dataset","ai moderation dataset","harmful content dataset","llm alignment dataset"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"llm","dataShowType":"[{\"code\":\"0\",\"language\":\"ZH\"},{\"code\":\"1\",\"language\":\"ZH\"},{\"code\":\"2\",\"language\":\"EN,PT,DE,KO,FR,ES\"},{\"code\":\"3\",\"language\":\"EN\"},{\"code\":\"4\",\"language\":\"JP\"}]","productNameEn":"Chinese Large Language Model content safety considerations text data","BGimg":"","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"]}
This dataset containing approximately 570,000 question–answer pairs. The data covers 31 established content safety categories (CAC) along with additional emerging risk categories. All samples are written by professional annotators, this dataset can be used for tasks such as large language model training, safety evaluation, and supervised fine-tuning focused on content moderation and risk handling.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Data content
Large Language Model content safety considerations text data
Data size
About 570,000 sets of question and answer data; covering 31 categories of CAC + other new categories