[{"@type":"PropertyValue","name":"Data size","value":"2.4 million pairs of images and descriptions"},{"@type":"PropertyValue","name":"Image type","value":"covers landscapes, animals, flowers and trees, people, cars, sports, industry, and architecture"},{"@type":"PropertyValue","name":"Data format","value":"image format is .jpg, text format is .txt"},{"@type":"PropertyValue","name":"Text length","value":"in principle, the description should be no less than 200 Chinese characters"},{"@type":"PropertyValue","name":"Main description content","value":"overall scene of the picture, detailed description of the elements within the scene, and the emotions conveyed by the picture"},{"@type":"PropertyValue","name":"Accuracy rate","value":"the proportion of correctly labeled images is not less than 95%"},{"@type":"PropertyValue","name":"Image Resolution","value":"no less than 2 million pixels, most of them are higher than 5 million pixels"}]
{"id":1437,"datatype":"1","titleimg":"https://www.nexdata.ai/shujutang/static/image/index/datatang_tuxiang_default.webp","type1":"226","type1str":null,"type2":"254","type2str":null,"dataname":"Bilingual Image Caption Dataset - 2.4 Million Pairs","datazy":[{"title":"Data size","content":"2.4 million pairs of images and descriptions","desc":"Data size"},{"title":"Image type","content":"covers landscapes, animals, flowers and trees, people, cars, sports, industry, and architecture","desc":"Image type"},{"title":"Data format","content":"image format is .jpg, text format is .txt","desc":"Data format"},{"title":"Text length","content":"in principle, the description should be no less than 200 Chinese characters","desc":"Text length"},{"title":"Main description content","content":"overall scene of the picture, detailed description of the elements within the scene, and the emotions conveyed by the picture","desc":"Main description content"},{"title":"Accuracy rate","content":"the proportion of correctly labeled images is not less than 95%","desc":"Accuracy rate"},{"title":"Image Resolution","content":"no less than 2 million pixels, most of them are higher than 5 million pixels","desc":"Image Resolution"}],"datatag":"AIGC,English description,Chinese description,Multiple image categories,Multiple descriptions","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":"","samplePresentation":[{"name":"/data/apps/damp/temp/ziptemp/APY240731001_demo1733565600188/1.png","url":"https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY240731001_demo1733565600188/1.png?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=ZE1joqM%2Fkptv4wFRlMnRt1e3MEI%3D","intro":"","size":0,"progress":100,"type":"jpg"},{"name":"/data/apps/damp/temp/ziptemp/APY240731001_demo1733565600188/2.png","url":"https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY240731001_demo1733565600188/2.png?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=pTcoPbnWlmGDbmXot7NGi%2BnKy1I%3D","intro":"","size":0,"progress":100,"type":"jpg"}],"officialSummary":"THis dataset consisting of about 2.4 million image–text pairs. The images cover various categories, including landscapes, animals, flowers and trees, people, cars, sports, industry, and architecture, along with an aesthetic subset. Each image is paired with descriptive captions provided in both English and Chinese, covering overall scene understanding, local visual details, and high-level emotional context.","dataexampl":null,"datakeyword":["image caption data","image captioning dataset","image text dataset","multimodal dataset","vision language dataset"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"llm","dataShowType":"[{\"code\":\"0\",\"language\":\"ZH\"},{\"code\":\"1\",\"language\":\"ZH\"},{\"code\":\"2\",\"language\":\"EN,JP,PT,DE,KO,FR,ES\"},{\"code\":\"3\",\"language\":\"EN\"},{\"code\":\"4\",\"language\":\"JP\"}]","productNameEn":"2,858,306 Pairs Image Caption Data Of General Scenes","BGimg":"","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"]}
Bilingual Image Caption Dataset - 2.4 Million Pairs
image caption data
image captioning dataset
image text dataset
multimodal dataset
vision language dataset
THis dataset consisting of about 2.4 million image–text pairs. The images cover various categories, including landscapes, animals, flowers and trees, people, cars, sports, industry, and architecture, along with an aesthetic subset. Each image is paired with descriptive captions provided in both English and Chinese, covering overall scene understanding, local visual details, and high-level emotional context.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Data size
2.4 million pairs of images and descriptions
Image type
covers landscapes, animals, flowers and trees, people, cars, sports, industry, and architecture
Data format
image format is .jpg, text format is .txt
Text length
in principle, the description should be no less than 200 Chinese characters
Main description content
overall scene of the picture, detailed description of the elements within the scene, and the emotions conveyed by the picture
Accuracy rate
the proportion of correctly labeled images is not less than 95%
Image Resolution
no less than 2 million pixels, most of them are higher than 5 million pixels