Bilingual Image Caption Dataset

89K Japanese-Arabic Image-Text Dataset for Multimodal LLM Training

The dataset comprises a total of 89,007 samples, with each sample consisting of an image and a JSON document. The JSON document may contain image descriptions, visual question-answering pairs, OCR results extracted from the image, or visual question-answering pairs based on the OCR results. The dataset covers Arabic and Japanese languages and spans six major domains: ① Business and Finance, ②Coding and Computer Science, ③Law, Government, and Politics, ④Science, Technology, Engineering, and Mathematics (STEM), ⑤Society, Culture, Humanities, and Religion, ⑥ Sports, Lifestyle, and Leisure. Image classification accuracy (per-image) exceeds 95%; image-text matching accuracy is above 95%; OCR recognition accuracy (per-sentence) exceeds 95%. Suitable for multilingual OCR, multimodal Large Language Model (LLM) training, image captioning, and multilingual Visual Question Answering (VQA) tasks.

image text dataset multimodal dataset vision language dataset image caption dataset vlm training data multimodal llm dataset

30M High-Quality Video Dataset – Copyright-Cleared & Commercial-Ready

This dataset features 30 million high-resolution video clips sourced from authorized and legally compliant channels. Each video offers exceptional clarity, color accuracy, and scene diversity across various environments and themes. All clips include clearly documented copyright ownership and commercial usage rights, making them safe and reliable for both academic research and business applications. The dataset is ideal for computer vision tasks such as video classification, action recognition, multimodal learning, object tracking, and content generation. Whether you're training deep learning models or sourcing clean, scalable visual data, this video dataset offers a comprehensive, large-scale solution.

video dataset high-resolution video dataset AI training video data computer vision dataset commercial video dataset deep learning video data video classification dataset multimodal video dataset large-scale video dataset

80M Vector Image Dataset – AI Training & Commercial Use

This dataset contains 80 million high-quality vector images (SVG, EPS, AI formats), offering a vast collection for use in computer vision, machine learning, and creative applications. Each image is copyright-cleared and legally sourced through authorized channels, with transparent usage rights for both commercial and academic purposes. The dataset features a wide variety of vector content—icons, illustrations, infographics, and more—with excellent color fidelity and scalable resolution. Ideal for AI model training (e.g., image classification, object recognition), generative design models, and creative design inspiration, this resource ensures traceable IP rights and enables safe, large-scale usage in real-world environments.

vector image dataset AI training vector graphics royalty-free vector images commercial-use vector dataset SVG dataset for machine learning computer vision image dataset large vector image collection image recognition dataset scalable vector dataset

200 Million High-Quality Image Dataset for AI and Computer Vision

This dataset contains 200 million high-quality images that have undergone professional review. The resources are diverse in type, featuring high resolution and clarity, excellent color accuracy, and rich detail. All materials have been legally obtained through authorized channels, with clear indications of copyright ownership and usage authorization scope. The entire collection provides commercial-grade usage rights and has been granted permission for scientific research use, ensuring clear and traceable intellectual property attribution. The vast and high-quality image resources offer robust support for a wide range of applications, including research in the field of computer vision, training of image recognition algorithms, and sourcing materials for creative design, thereby facilitating efficient progress in related areas.

large image dataset AI image dataset commercial image dataset computer vision image dataset image recognition training dataset high-quality image dataset

7 Million Sets - High Quality Video Caption Dataset

This dataset contains 7 million global genuine high-quality videos. All are genuine video works released by photographers around the world. The dataset includes 6 million videos with English captions and 1 million with Chinese captions. They cover a variety of categories such as people, landscapes, animals, etc. The resolution is above 1080p. The data is suitable for video captioning, vision–language model training, multimodal understanding.

video caption dataset video captioning dataset video caption data video text dataset multimodal video dataset vision language video dataset

300M Image-Caption Pairs – Large-Scale Vision-Language Dataset for AI Training

300 Million Pairs of High-Quality Image-Caption Dataset includes a large-scale collection of photographic and vector images paired with English textual descriptions. The complete image library comprises nearly 300 million images, with a curated subset of 100 million high-quality image-caption pairs available for generative AI and vision-language model training. All images are authentic and legally licensed works created by professional photographers. The dataset primarily features English captions with minimal Chinese, offering diverse scenes, objects, and compositions suitable for tasks such as image captioning, visual question answering (VQA), image-text retrieval, and multimodal foundation model pretraining. The dataset supports large-scale LLM and VLM applications and complies with global data privacy and copyright regulations, including GDPR, CCPA, and PIPL.

image-caption dataset image-text pairs vision-language data generative AI training dataset multimodal AI dataset image description data LLM vision data AI image-text alignment high-quality image data

30M High-Quality Video Dataset – Copyright-Cleared & Commercial-Ready

This dataset features 30 million high-resolution video clips sourced from authorized and legally compliant channels. Each video offers exceptional clarity, color accuracy, and scene diversity across various environments and themes. All clips include clearly documented copyright ownership and commercial usage rights, making them safe and reliable for both academic research and business applications. The dataset is ideal for computer vision tasks such as video classification, action recognition, multimodal learning, object tracking, and content generation. Whether you're training deep learning models or sourcing clean, scalable visual data, this video dataset offers a comprehensive, large-scale solution.

video dataset high-resolution video dataset AI training video data computer vision dataset commercial video dataset deep learning video data video classification dataset multimodal video dataset large-scale video dataset

80M Vector Image Dataset – AI Training & Commercial Use

This dataset contains 80 million high-quality vector images (SVG, EPS, AI formats), offering a vast collection for use in computer vision, machine learning, and creative applications. Each image is copyright-cleared and legally sourced through authorized channels, with transparent usage rights for both commercial and academic purposes. The dataset features a wide variety of vector content—icons, illustrations, infographics, and more—with excellent color fidelity and scalable resolution. Ideal for AI model training (e.g., image classification, object recognition), generative design models, and creative design inspiration, this resource ensures traceable IP rights and enables safe, large-scale usage in real-world environments.

vector image dataset AI training vector graphics royalty-free vector images commercial-use vector dataset SVG dataset for machine learning computer vision image dataset large vector image collection image recognition dataset scalable vector dataset

Bilingual Image Caption Dataset - 2.4 Million Pairs

image caption data

image captioning dataset

image text dataset

multimodal dataset

vision language dataset

Current Project Maturity

Bilingual Image Caption Dataset - 2.4 Million Pairs

image caption data image captioning dataset image text dataset multimodal dataset vision language dataset

Current Project Maturity

image caption data

image captioning dataset

image text dataset

multimodal dataset

vision language dataset