{"id":2121,"datatype":"1","titleimg":"https://www.nexdata.ai/shujutang/static/image/index/datatang_tuxiang_default.webp","type1":"226","type1str":null,"type2":"227","type2str":null,"dataname":"LongContext Reasoning Dataset","datazy":[{"title":"Content","content":"Long-document Multi-hop Reasoning QA Dataset"},{"title":"Data Size","content":"7,500"},{"title":"Data Fields","content":"id、context、file_count、question、answer、reasoning_chain、supporting_evidence、hops"},{"title":"Language","content":"ZH,EN,KO"},{"title":"Format","content":"JSON"}],"datatag":"LongContext,Reasoning","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":null,"samplePresentation":[],"officialSummary":"This dataset is designed to tackle the core weaknesses of today's large language models when it comes to processing long documents and performing complex reasoning. It consists of 7,500 high-quality training examples across three languages—Chinese, English, and Korean. Each instance is built around a long-text passage and includes questions that require synthesizing information across paragraphs and documents, while following multi-step logical chains. The goal is to offer a thorough and rigorous evaluation framework that tests a model's ability to perceive long-range context, retrieve relevant information, construct sound reasoning paths, and trace evidence back to its source.","dataexampl":null,"datakeyword":["LongContext,Reasoning"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"llm","dataShowType":"[{\"code\":\"0\",\"language\":\"ZH\"},{\"code\":\"1\",\"language\":\"ZH\"},{\"code\":\"2\",\"language\":\"EN\"},{\"code\":\"3\",\"language\":\"EN\"}]","productNameEn":"LongContext Reasoning Dataset","BGimg":"","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"]}
This dataset is designed to tackle the core weaknesses of today's large language models when it comes to processing long documents and performing complex reasoning. It consists of 7,500 high-quality training examples across three languages—Chinese, English, and Korean. Each instance is built around a long-text passage and includes questions that require synthesizing information across paragraphs and documents, while following multi-step logical chains. The goal is to offer a thorough and rigorous evaluation framework that tests a model's ability to perceive long-range context, retrieve relevant information, construct sound reasoning paths, and trace evidence back to its source.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.