[{"@type":"PropertyValue","name":"Data size","value":"300 million images, each paired with a textual description. Complete image library (including photographic + vector images) totals nearly 300 million, Full dataset available for generative AI training (curated photographic + vector images excluding editorial/news images) comprises approximately 100 million."},{"@type":"PropertyValue","name":"Data formats","value":"Image formats: .jpg, .png, .svg; Description format: .txt"},{"@type":"PropertyValue","name":"Data content","value":"Original copyrighted image works officially released by creators, accompanying descriptions authored by content creators."},{"@type":"PropertyValue","name":"Data types","value":"Photographic images and vector illustrations, covers diverse scene categories."},{"@type":"PropertyValue","name":"Data resolution","value":"4K and above"},{"@type":"PropertyValue","name":"Description languages","value":"Predominantly English (majority), Minimal Chinese portion."}]
{"id":1451,"datatype":"1","titleimg":"https://www.nexdata.ai/shujutang/static/image/index/datatang_tuxiang_default.webp","type1":"226","type1str":null,"type2":"254","type2str":null,"dataname":"300M Image-Caption Pairs – Large-Scale Vision-Language Dataset for AI Training","datazy":[{"title":"Data size","content":"300 million images, each paired with a textual description. Complete image library (including photographic + vector images) totals nearly 300 million, Full dataset available for generative AI training (curated photographic + vector images excluding editorial/news images) comprises approximately 100 million.","desc":"Data size"},{"title":"Data formats","content":"Image formats: .jpg, .png, .svg; Description format: .txt","desc":"Data formats"},{"title":"Data content","content":"Original copyrighted image works officially released by creators, accompanying descriptions authored by content creators.","desc":"Data content"},{"title":"Data types","content":"Photographic images and vector illustrations, covers diverse scene categories.","desc":"Data types"},{"title":"Data resolution","content":"4K and above","desc":"Data resolution"},{"title":"Description languages","content":"Predominantly English (majority), Minimal Chinese portion.","desc":"Description languages"}],"datatag":"","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":"","samplePresentation":[],"officialSummary":"300 Million Pairs of High-Quality Image-Caption Dataset includes a large-scale collection of photographic and vector images paired with English textual descriptions. The complete image library comprises nearly 300 million images, with a curated subset of 100 million high-quality image-caption pairs available for generative AI and vision-language model training. All images are authentic and legally licensed works created by professional photographers. The dataset primarily features English captions with minimal Chinese, offering diverse scenes, objects, and compositions suitable for tasks such as image captioning, visual question answering (VQA), image-text retrieval, and multimodal foundation model pretraining. The dataset supports large-scale LLM and VLM applications and complies with global data privacy and copyright regulations, including GDPR, CCPA, and PIPL.","dataexampl":null,"datakeyword":["image-caption dataset","image-text pairs","vision-language data","generative AI training dataset","multimodal AI dataset","image description data","LLM vision data","AI image-text alignment","high-quality image data"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"llm","BGimg":"","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"]}
300M Image-Caption Pairs – Large-Scale Vision-Language Dataset for AI Training
image-caption dataset
image-text pairs
vision-language data
generative AI training dataset
multimodal AI dataset
image description data
LLM vision data
AI image-text alignment
high-quality image data
300 Million Pairs of High-Quality Image-Caption Dataset includes a large-scale collection of photographic and vector images paired with English textual descriptions. The complete image library comprises nearly 300 million images, with a curated subset of 100 million high-quality image-caption pairs available for generative AI and vision-language model training. All images are authentic and legally licensed works created by professional photographers. The dataset primarily features English captions with minimal Chinese, offering diverse scenes, objects, and compositions suitable for tasks such as image captioning, visual question answering (VQA), image-text retrieval, and multimodal foundation model pretraining. The dataset supports large-scale LLM and VLM applications and complies with global data privacy and copyright regulations, including GDPR, CCPA, and PIPL.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Data size
300 million images, each paired with a textual description. Complete image library (including photographic + vector images) totals nearly 300 million, Full dataset available for generative AI training (curated photographic + vector images excluding editorial/news images) comprises approximately 100 million.