[{"@type":"PropertyValue","name":"Data size","value":"700 thousand sets of images and descriptions"},{"@type":"PropertyValue","name":"Image type","value":"covers landscapes, animals, flowers and trees, people, cars, sports, industry, and architecture, as well as an aesthetic subset"},{"@type":"PropertyValue","name":"Data format","value":"image format is .jpg, text format is .txt"},{"@type":"PropertyValue","name":"Description language","value":"Chinese, English"},{"@type":"PropertyValue","name":"Text length","value":"in principle, a single sentence should be 5-20 characters, and each picture should cover no less than two types of descriptions, each with one sentence; a few images have only one description"},{"@type":"PropertyValue","name":"Main description content","value":"the main scene or some salient features in the image"},{"@type":"PropertyValue","name":"Accuracy rate","value":"the proportion of correctly labeled images is not less than 95%"}]
{"id":1331,"datatype":"1","titleimg":"https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/asset/productNew/nexdata/APY231231008.jpg?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=gkTjp3Qob%2B%2Ff%2BBu1aSDeMo89RPo%3D","type1":"226","type1str":null,"type2":"254","type2str":null,"dataname":"Image Caption Dataset - 814K Image of General Scenes","datazy":[{"title":"Data size","content":"700 thousand sets of images and descriptions","desc":"Data size"},{"title":"Image type","content":"covers landscapes, animals, flowers and trees, people, cars, sports, industry, and architecture, as well as an aesthetic subset","desc":"Image type"},{"title":"Data format","content":"image format is .jpg, text format is .txt","desc":"Data format"},{"title":"Description language","content":"Chinese, English","desc":"Description language"},{"title":"Text length","content":"in principle, a single sentence should be 5-20 characters, and each picture should cover no less than two types of descriptions, each with one sentence; a few images have only one description","desc":"Text length"},{"title":"Main description content","content":"the main scene or some salient features in the image","desc":"Main description content"},{"title":"Accuracy rate","content":"the proportion of correctly labeled images is not less than 95%","desc":"Accuracy rate"}],"datatag":"AIGC,Chinese description,Multiple image categories,Multiple descriptions","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":"","samplePresentation":[{"name":"/data/apps/damp/temp/ziptemp/APY231231008_demo1713866400728/??2.png","url":"https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY231231008_demo1713866400728/%3F%3F2.png?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=ckAvtHArH42AedTDXV8sjJQoElc%3D","intro":"","size":0,"progress":100,"type":"jpg"},{"name":"/data/apps/damp/temp/ziptemp/APY231231008_demo1713866400728/??5.png","url":"https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY231231008_demo1713866400728/%3F%3F5.png?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=ptmVRbOHoNi15wid23DeScC9WV8%3D","intro":"","size":0,"progress":100,"type":"jpg"},{"name":"/data/apps/damp/temp/ziptemp/APY231231008_demo1713866400728/??1.png","url":"https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY231231008_demo1713866400728/%3F%3F1.png?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=fvLVUoqJCnI%2FC%2FHWKGjTJ1u8rTo%3D","intro":"","size":0,"progress":100,"type":"jpg"}],"officialSummary":"This dataset contains 814,312 image–text pairs covering a wide range of general scene categories, including landscapes, animals, flowers and trees, people, cars, sports, industries, and buildings. Category and an aesthetic subset. Each image is annotated with at least two single-sentence Chinese descriptions, with a small number of images containing only one description. The data is suitable for image captioning, vision–language model training, multimodal understanding.","dataexampl":null,"datakeyword":["image caption dataset for llm","general scene image caption dataset","chinese image caption dataset","multimodal image text data","image description dataset"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"llm","dataShowType":"[{\"code\":\"0\",\"language\":\"ZH\"},{\"code\":\"1\",\"language\":\"ZH\"},{\"code\":\"2\",\"language\":\"EN,JP,PT,DE,KO,FR,ES\"},{\"code\":\"3\",\"language\":\"EN\"},{\"code\":\"4\",\"language\":\"JP\"}]","productNameEn":"814,312 Pairs Image Caption Data Of General Scenes","BGimg":"","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"]}
Image Caption Dataset - 814K Image of General Scenes
image caption dataset for llm
general scene image caption dataset
chinese image caption dataset
multimodal image text data
image description dataset
This dataset contains 814,312 image–text pairs covering a wide range of general scene categories, including landscapes, animals, flowers and trees, people, cars, sports, industries, and buildings. Category and an aesthetic subset. Each image is annotated with at least two single-sentence Chinese descriptions, with a small number of images containing only one description. The data is suitable for image captioning, vision–language model training, multimodal understanding.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Data size
700 thousand sets of images and descriptions
Image type
covers landscapes, animals, flowers and trees, people, cars, sports, industry, and architecture, as well as an aesthetic subset
Data format
image format is .jpg, text format is .txt
Description language
Chinese, English
Text length
in principle, a single sentence should be 5-20 characters, and each picture should cover no less than two types of descriptions, each with one sentence; a few images have only one description
Main description content
the main scene or some salient features in the image
Accuracy rate
the proportion of correctly labeled images is not less than 95%