[{"@type":"PropertyValue","name":"Data Content","value":"Each data sample consists of one image and one JSON document. The JSON document contains either:OCR text recognition results of the image, or a textual description (caption) of the image, or visual question answering (VQA) based on the image, or visual question answering based on the OCR recognition results of the image,Among them, visual question answering includes at least one round of Q&A."},{"@type":"PropertyValue","name":"Data Scale","value":"89,007 sets in total, including 42,094 sets in Arabic and 46,913 sets in Japanese."},{"@type":"PropertyValue","name":"Category Distribution","value":"The dataset includes two languages, Japanese and Arabic, and covers four task categories for each language: Image Captioning , Visual Question Answering, Optical Character Recognition , and OCR-based Visual Question Answering. Each category is further divided into six domains: ①Business and Finance, ②Coding and Computer Science,③Law, Government, and Politics, ④Science, Technology, Engineering, and Mathematics , ⑤Society, Culture, Humanities, and Religion , ⑥Sports, Lifestyle, and Leisure."},{"@type":"PropertyValue","name":"Data Format","value":"Images in JPG or other common image formats; annotations in JSON format."},{"@type":"PropertyValue","name":"Collection accuracy","value":"The accuracy of image domain classification(per-image accuracy) is above 95%"},{"@type":"PropertyValue","name":"Annotation Accuracy","value":"The matching degree between image and text description is greater than 95%;OCR recognition accuracy (per-sentence accuracy) must exceed 95%. Accuracy is measured by segmenting at punctuation marks (such as commas, semicolons, exclamation marks, etc.) or at titles/headings."}]
{"id":1828,"datatype":"1","titleimg":"https://www.nexdata.ai/shujutang/static/image/index/datatang_tuxiang_default.webp","type1":"226","type1str":null,"type2":"254","type2str":null,"dataname":"89,007 Sets of Japanese–Arabic Image-Text Construction Data","datazy":[{"title":"Data Content","content":"Each data sample consists of one image and one JSON document. The JSON document contains either:OCR text recognition results of the image, or a textual description (caption) of the image, or visual question answering (VQA) based on the image, or visual question answering based on the OCR recognition results of the image,Among them, visual question answering includes at least one round of Q&A."},{"title":"Data Scale","content":"89,007 sets in total, including 42,094 sets in Arabic and 46,913 sets in Japanese."},{"title":"Category Distribution","content":"The dataset includes two languages, Japanese and Arabic, and covers four task categories for each language: Image Captioning , Visual Question Answering, Optical Character Recognition , and OCR-based Visual Question Answering. Each category is further divided into six domains: ①Business and Finance, ②Coding and Computer Science,③Law, Government, and Politics, ④Science, Technology, Engineering, and Mathematics , ⑤Society, Culture, Humanities, and Religion , ⑥Sports, Lifestyle, and Leisure."},{"title":"Data Format","content":"Images in JPG or other common image formats; annotations in JSON format."},{"title":"Collection accuracy","content":"The accuracy of image domain classification(per-image accuracy) is above 95%"},{"title":"Annotation Accuracy","content":"The matching degree between image and text description is greater than 95%;OCR recognition accuracy (per-sentence accuracy) must exceed 95%. Accuracy is measured by segmenting at punctuation marks (such as commas, semicolons, exclamation marks, etc.) or at titles/headings."}],"datatag":"Japanese,Arabic,Visual Question Answering(VQA),Image Captioning,Optical Character Recognition(OCR)","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":null,"samplePresentation":[],"officialSummary":"The product contains a total of 89,007 data samples, with each sample consisting of one image and one JSON document. The JSON document may contain an image caption, a visual question-answering pair, OCR results extracted from the image, or a visual question-answering pair based on the OCR results. The dataset covers Arabic and Japanese languages and spans six domains:① Business and Finance, ②Coding and Computer Science, ③Law, Government, and Politics, ④Science, Technology, Engineering, and Mathematics (STEM), ⑤Society, Culture, Humanities, and Religion, ⑥ Sports, Lifestyle, and Leisure. The accuracy of image domain classification(per-image accuracy) is above 95%;The matching degree between image and text description is greater than 95%;OCR recognition accuracy (per-sentence accuracy) must exceed 95%. Suitable for multilingual OCR, multimodal LLM training, image captioning, and multilingual VQA tasks.","dataexampl":null,"datakeyword":["Japanese","Arabic","Visual Question Answering(VQA)","Image Captioning","Optical Character Recognition(OCR)"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"llm","dataShowType":"[{\"code\":\"0\",\"language\":\"ZH\"},{\"code\":\"1\",\"language\":\"ZH\"},{\"code\":\"2\",\"language\":\"EN\"},{\"code\":\"3\",\"language\":\"EN\"},{\"code\":\"4\",\"language\":\"JP\"}]","productNameEn":"89,007 Sets of Japanese–Arabic Image-Text Construction Data","BGimg":"","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"]}
89,007 Sets of Japanese–Arabic Image-Text Construction Data
Japanese
Arabic
Visual Question Answering(VQA)
Image Captioning
Optical Character Recognition(OCR)
The product contains a total of 89,007 data samples, with each sample consisting of one image and one JSON document. The JSON document may contain an image caption, a visual question-answering pair, OCR results extracted from the image, or a visual question-answering pair based on the OCR results. The dataset covers Arabic and Japanese languages and spans six domains:① Business and Finance, ②Coding and Computer Science, ③Law, Government, and Politics, ④Science, Technology, Engineering, and Mathematics (STEM), ⑤Society, Culture, Humanities, and Religion, ⑥ Sports, Lifestyle, and Leisure. The accuracy of image domain classification(per-image accuracy) is above 95%;The matching degree between image and text description is greater than 95%;OCR recognition accuracy (per-sentence accuracy) must exceed 95%. Suitable for multilingual OCR, multimodal LLM training, image captioning, and multilingual VQA tasks.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Data Content
Each data sample consists of one image and one JSON document. The JSON document contains either:OCR text recognition results of the image, or a textual description (caption) of the image, or visual question answering (VQA) based on the image, or visual question answering based on the OCR recognition results of the image,Among them, visual question answering includes at least one round of Q&A.
Data Scale
89,007 sets in total, including 42,094 sets in Arabic and 46,913 sets in Japanese.
Category Distribution
The dataset includes two languages, Japanese and Arabic, and covers four task categories for each language: Image Captioning , Visual Question Answering, Optical Character Recognition , and OCR-based Visual Question Answering. Each category is further divided into six domains: ①Business and Finance, ②Coding and Computer Science,③Law, Government, and Politics, ④Science, Technology, Engineering, and Mathematics , ⑤Society, Culture, Humanities, and Religion , ⑥Sports, Lifestyle, and Leisure.
Data Format
Images in JPG or other common image formats; annotations in JSON format.
Collection accuracy
The accuracy of image domain classification(per-image accuracy) is above 95%
Annotation Accuracy
The matching degree between image and text description is greater than 95%;OCR recognition accuracy (per-sentence accuracy) must exceed 95%. Accuracy is measured by segmenting at punctuation marks (such as commas, semicolons, exclamation marks, etc.) or at titles/headings.