en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

89,007 Sets of Japanese–Arabic Image-Text Construction Data

Japanese
Arabic
Visual Question Answering(VQA)
Image Captioning
Optical Character Recognition(OCR)

The product contains a total of 89,007 data samples, with each sample consisting of one image and one JSON document. The JSON document may contain an image caption, a visual question-answering pair, OCR results extracted from the image, or a visual question-answering pair based on the OCR results. The dataset covers Arabic and Japanese languages and spans six domains:① Business and Finance, ②Coding and Computer Science, ③Law, Government, and Politics, ④Science, Technology, Engineering, and Mathematics (STEM), ⑤Society, Culture, Humanities, and Religion, ⑥ Sports, Lifestyle, and Leisure. The accuracy of image domain classification(per-image accuracy) is above 95%;The matching degree between image and text description is greater than 95%;OCR recognition accuracy (per-sentence accuracy) must exceed 95%. Suitable for multilingual OCR, multimodal LLM training, image captioning, and multilingual VQA tasks.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Data Content
Each data sample consists of one image and one JSON document. The JSON document contains either:OCR text recognition results of the image, or a textual description (caption) of the image, or visual question answering (VQA) based on the image, or visual question answering based on the OCR recognition results of the image,Among them, visual question answering includes at least one round of Q&A.
Data Scale
89,007 sets in total, including 42,094 sets in Arabic and 46,913 sets in Japanese.
Category Distribution
The dataset includes two languages, Japanese and Arabic, and covers four task categories for each language: Image Captioning , Visual Question Answering, Optical Character Recognition , and OCR-based Visual Question Answering. Each category is further divided into six domains: ①Business and Finance, ②Coding and Computer Science,③Law, Government, and Politics, ④Science, Technology, Engineering, and Mathematics , ⑤Society, Culture, Humanities, and Religion , ⑥Sports, Lifestyle, and Leisure.
Data Format
Images in JPG or other common image formats; annotations in JSON format.
Collection accuracy
The accuracy of image domain classification(per-image accuracy) is above 95%
Annotation Accuracy
The matching degree between image and text description is greater than 95%;OCR recognition accuracy (per-sentence accuracy) must exceed 95%. Accuracy is measured by segmenting at punctuation marks (such as commas, semicolons, exclamation marks, etc.) or at titles/headings.
Sample Sample
Recommended DatasetsRecommended Dataset
Tell Us Your Special Needs

Current Project Maturity

Early exploration (no concrete specs yet)
Defined goals, need professional guidance
Active development or optimization phase
Data & labeling experts with clear specifications

By submitting, I agree to the Privacy Protection

4cec8a4b-9c56-4012-82d1-b09d6dae92f1

089638d0-0f43-4af4-a0f7-41ccb7cefb05