[{"@type":"PropertyValue","name":"Quantity of Data","value":"100,000"},{"@type":"PropertyValue","name":"Data use","value":"Instruction-Following Evaluation for Chinese LLM"},{"@type":"PropertyValue","name":"Data content","value":"A variety of complex prompt instructions, between 50 and 400 words, with no fewer than 3 constraints in each prompt"},{"@type":"PropertyValue","name":"Production method","value":"All prompt are manually written to satisfy the diversity of coverage"},{"@type":"PropertyValue","name":"Language","value":"Chinese"}]
{"id":1456,"datatype":"1","titleimg":"https://www.nexdata.ai/shujutang/static/image/index/datatang_tuxiang_default.webp","type1":"226","type1str":null,"type2":"228","type2str":null,"dataname":"100K Chinese LLM Instruction-Following Dataset","datazy":[{"title":"Quantity of Data","content":"100,000","desc":"Quantity of Data"},{"title":"Data use","content":"Instruction-Following Evaluation for Chinese LLM","desc":"Data use"},{"title":"Data content","content":"A variety of complex prompt instructions, between 50 and 400 words, with no fewer than 3 constraints in each prompt","desc":"Data content"},{"title":"Production method","content":"All prompt are manually written to satisfy the diversity of coverage","desc":"Production method"},{"title":"Language","content":"Chinese","desc":"Language"}],"datatag":"LLM,Instruction-Following,SFT","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":"","samplePresentation":[{"name":"生成类样例.png","url":"https://storage-product.datatang.com/damp/product/samplePresentation_ipad/20250718135224/%E7%94%9F%E6%88%90%E7%B1%BB%E6%A0%B7%E4%BE%8B.png?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=j2dYaUZzrFuOBpHy%2BbE9IzU0uZU%3D","intro":"","size":58978,"progress":100,"type":"jpg"},{"name":"提取类样例.png","url":"https://storage-product.datatang.com/damp/product/samplePresentation_ipad/20250718135224/%E6%8F%90%E5%8F%96%E7%B1%BB%E6%A0%B7%E4%BE%8B.png?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=H5qxm0wOjJhXG2j%2Fs%2FjYogclTzE%3D","intro":"","size":29336,"progress":100,"type":"jpg"},{"name":"摘要类样例.png","url":"https://storage-product.datatang.com/damp/product/samplePresentation_ipad/20250718135224/%E6%91%98%E8%A6%81%E7%B1%BB%E6%A0%B7%E4%BE%8B.png?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=vLGxT%2FdoEGEPX5%2F0WVVEAF7y7AE%3D","intro":"","size":65076,"progress":100,"type":"jpg"}],"officialSummary":"This dataset contains 50-400 words, with each prompt containing at least three constraints to train and improve the instruction-following performance of large models. Categories cover generation (news releases, interview outlines, copywriting, manuscript proofreading, Chinese-English essays, grammar learning, research reports, study plans, poetry writing, food descriptions, advertising copy, sales scripts, official document writing assistance, official document review, policy document Q&A, etc.), rewriting (sentence rewriting, text correction, sentence merging, copywriting simplification), summarizing (content summarization), and extraction (event element extraction, opinion extraction, keyword extraction, stance extraction, entity extraction). All prompts are manually compiled to ensure diverse coverage. The dataset is suitable for systematic benchmarking and model assessment.","dataexampl":null,"datakeyword":["LLM evaluation dataset","Chinese LLM instruction following dataset","Instruction-following prompt dataset","Prompt benchmark dataset LLM"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"llm","dataShowType":"[{\"code\":\"0\",\"language\":\"ZH\"},{\"code\":\"1\",\"language\":\"ZH\"},{\"code\":\"2\",\"language\":\"EN,JP,PT,DE,KO,FR,ES\"},{\"code\":\"3\",\"language\":\"EN\"},{\"code\":\"4\",\"language\":\"JP\"}]","productNameEn":"100,000 Instruction-Following Evaluation SFT for Chinese LLM Text Data","BGimg":"","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"],"firstList":[{"name":"重写类样例.png","url":"https://storage-product.datatang.com/damp/product/samplePresentation_ipad/20250718135224/%E9%87%8D%E5%86%99%E7%B1%BB%E6%A0%B7%E4%BE%8B.png?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=SgPlb%2FRyoOnK4YX2Efnr0ZuCJgY%3D","intro":"","size":32077,"progress":100,"type":"jpg"}]}
This dataset contains 50-400 words, with each prompt containing at least three constraints to train and improve the instruction-following performance of large models. Categories cover generation (news releases, interview outlines, copywriting, manuscript proofreading, Chinese-English essays, grammar learning, research reports, study plans, poetry writing, food descriptions, advertising copy, sales scripts, official document writing assistance, official document review, policy document Q&A, etc.), rewriting (sentence rewriting, text correction, sentence merging, copywriting simplification), summarizing (content summarization), and extraction (event element extraction, opinion extraction, keyword extraction, stance extraction, entity extraction). All prompts are manually compiled to ensure diverse coverage. The dataset is suitable for systematic benchmarking and model assessment.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Quantity of Data
100,000
Data use
Instruction-Following Evaluation for Chinese LLM
Data content
A variety of complex prompt instructions, between 50 and 400 words, with no fewer than 3 constraints in each prompt
Production method
All prompt are manually written to satisfy the diversity of coverage