[{"@type":"PropertyValue","name":"Data size","value":"300 million images, each paired with a textual description. Complete image library (including photographic + vector images) totals nearly 300 million, Full dataset available for generative AI training (curated photographic + vector images excluding editorial/news images) comprises approximately 100 million."},{"@type":"PropertyValue","name":"Data formats","value":"Image formats: .jpg, .png, .svg; Description format: .txt"},{"@type":"PropertyValue","name":"Data content","value":"Original copyrighted image works officially released by creators, accompanying descriptions authored by content creators."},{"@type":"PropertyValue","name":"Data types","value":"Photographic images and vector illustrations, covers diverse scene categories."},{"@type":"PropertyValue","name":"Data resolution","value":"4K and above"},{"@type":"PropertyValue","name":"Description languages","value":"Predominantly English (majority), Minimal Chinese portion."}]
{"id":1451,"datatype":"1","titleimg":"","type1":"226","type1str":null,"type2":"254","type2str":null,"dataname":"300 million pairs of high-quality image-caption dataset","datazy":[{"title":"Data size","desc":"Data size","content":"300 million images, each paired with a textual description. Complete image library (including photographic + vector images) totals nearly 300 million, Full dataset available for generative AI training (curated photographic + vector images excluding editorial/news images) comprises approximately 100 million."},{"desc":"Data formats","content":"Image formats: .jpg, .png, .svg; Description format: .txt","title":"Data formats"},{"desc":"Data content","content":"Original copyrighted image works officially released by creators, accompanying descriptions authored by content creators.","title":"Data content"},{"desc":"Data types","content":"Photographic images and vector illustrations, covers diverse scene categories.","title":"Data types"},{"desc":"Data resolution","content":"4K and above","title":"Data resolution"},{"desc":"Description languages","content":"Predominantly English (majority), Minimal Chinese portion.","title":"Description languages"}],"datatag":"","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":"","samplePresentation":[],"officialSummary":"300 million images, each corresponding to a description. All are genuine image works published by photographers. The vast majority of descriptions are in English, with very few in Chinese.","dataexampl":null,"datakeyword":["multimodal","image","description"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"llm","BGimg":"","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"]}
300 million pairs of high-quality image-caption dataset
multimodal
image
description
300 million images, each corresponding to a description. All are genuine image works published by photographers. The vast majority of descriptions are in English, with very few in Chinese.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Data size
300 million images, each paired with a textual description. Complete image library (including photographic + vector images) totals nearly 300 million, Full dataset available for generative AI training (curated photographic + vector images excluding editorial/news images) comprises approximately 100 million.