{"id":1059,"datatype":"1","titleimg":"https://res.datatang.com/asset/productNew/APY200102002.png?Expires=2007353677&OSSAccessKeyId=LTAI5tQwXnJZbubgVfVa1ep9&Signature=iiWIUtj93MT7/xo12CRd1n2QzrI%3D","type1":"147","type1str":null,"type2":"150","type2str":null,"dataname":"4,995 Vietnamese OCR Images Data - Images with Annotation and Transcription","datazy":[{"title":"Data size","desc":"Data size","content":"4,995 OCR images, including 258 images of natural scenes, 2,553 Internet images, 2,184 document images"},{"title":"Collecting environment","desc":"Collecting environment","content":"including natural scenes (plaque, packaging instructions, small advertisements, menus, posters, etc.), Internet images (magazine covers, comic covers, etc.), document images (text documents, etc.)"},{"title":"Data diversity","desc":"Data diversity","content":"including multiple scenes, multiple angles, different light conditions"},{"title":"Device","desc":"Device","content":"cellphone"},{"title":"Shooting angles","desc":"Shooting angles","content":"looking up angle, eye-level angle"},{"title":"Format","desc":"Format","content":"the image data format is .jpg, the annotated file format is .json"},{"title":"Annotation content","desc":"Annotation content","content":"line-level quadrilateral bounding box annotation and transcription for the texts; column-level quadrilateral bounding box annotation and transcription for the texts"},{"title":"Accuracy","desc":"Accuracy","content":"the error bound of each vertex of quadrilateral bounding box is within 10 pixels, which is a qualified annotation, the accuracy of bounding boxes is not less than 97%; the texts transcription accuracy is not less than 97%"}],"datatag":"Vietnamese OCR,Multiple scenes,Multiple angles,Different light conditions","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":"","samplePresentation":[{"name":"/data/apps/damp/temp/ziptemp/APY200102002_demo1695808985220/APY200102002_demo/2.jpg","url":"https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY200102002_demo1695808985220/APY200102002_demo/2.jpg?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=ZA%2B%2BlomO%2FASafZT8yZfr87dZbVw%3D","intro":"","size":0,"progress":100,"type":"jpg"},{"name":"/data/apps/damp/temp/ziptemp/APY200102002_demo1695808985220/APY200102002_demo/3.jpg","url":"https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY200102002_demo1695808985220/APY200102002_demo/3.jpg?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=7TSAykKx%2FoJxOcmQFM5UWT1Pc4o%3D","intro":"","size":0,"progress":100,"type":"jpg"},{"name":"/data/apps/damp/temp/ziptemp/APY200102002_demo1695808985220/APY200102002_demo/1.jpg","url":"https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY200102002_demo1695808985220/APY200102002_demo/1.jpg?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=aaUlLyXcysr9UlS05GE0DGNaMhw%3D","intro":"","size":0,"progress":100,"type":"jpg"}],"officialSummary":"4,995 Vietnamese OCR Images Data - Images with Annotation and Transcription. The data includes 258 images of natural scenes, 2,553 Internet images, 2,184 document images. For line-level content annotation, line-level quadrilateral bounding box annotation and test transcription was adpoted; for column-level content annotation, column-level quadrilateral bounding box annotation and text transcription was adpoted. The data can be used for tasks such as Vietnamese recognition in multiple scenes.","dataexampl":null,"datakeyword":["Vietnamese OCR","Multiple scenes","Multiple angles","Different light conditions"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Data Type,Language","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"ocr","BGimg":"","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"],"firstList":[{"name":"/data/apps/damp/temp/ziptemp/APY200102002_demo1695808985220/APY200102002_demo/5.jpg","url":"https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY200102002_demo1695808985220/APY200102002_demo/5.jpg?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=GF0IWNgdbst4A2PLeJ9K3HXqsj8%3D","intro":"","size":0,"progress":100,"type":"jpg"}]}

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

4,995 Vietnamese OCR Images Data - Images with Annotation and Transcription

Vietnamese OCR

Multiple scenes

Multiple angles

Different light conditions

4,995 Vietnamese OCR Images Data - Images with Annotation and Transcription. The data includes 258 images of natural scenes, 2,553 Internet images, 2,184 document images. For line-level content annotation, line-level quadrilateral bounding box annotation and test transcription was adpoted; for column-level content annotation, column-level quadrilateral bounding box annotation and text transcription was adpoted. The data can be used for tasks such as Vietnamese recognition in multiple scenes.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Specifications

Data size

4,995 OCR images, including 258 images of natural scenes, 2,553 Internet images, 2,184 document images

Collecting environment

including natural scenes (plaque, packaging instructions, small advertisements, menus, posters, etc.), Internet images (magazine covers, comic covers, etc.), document images (text documents, etc.)

Data diversity

including multiple scenes, multiple angles, different light conditions

Device

cellphone

Shooting angles

looking up angle, eye-level angle

Format

the image data format is .jpg, the annotated file format is .json

Annotation content

line-level quadrilateral bounding box annotation and transcription for the texts; column-level quadrilateral bounding box annotation and transcription for the texts

Accuracy

the error bound of each vertex of quadrilateral bounding box is within 10 pixels, which is a qualified annotation, the accuracy of bounding boxes is not less than 97%; the texts transcription accuracy is not less than 97%

Sample

Recommended Dataset

71,535 Images English OCR Data in Natural Scenes

71,535 Images English OCR Data in Natural Scenes. The collecting scenes of this dataset are the real scenes in Britain and the United States. The data diversity includes multiple scenes, multiple photographic angles and multiple light conditions. For annotation, line-level & word-leve & character-level rectangular bounding box or quadrilateral bounding box annotation were adopted, the text transcription was also adopted. The dataset can be used for English OCR tasks in natural scenes.

OCR English Natural scenes

500,000 Images - Natural Scenes and Documents OCR Data

The dataset consists of 500,000 images for multi-country natural scenes and document OCR, including 20 languages such as Traditional Chinese, Japanese, Korean, Indonesian, Malay, Thai, Vietnamese, Polish, etc. The diversity includes various natural scenarios and multiple shooting angles. This set of data can be used for multi-language OCR tasks.

Natural scenes Documents OCR

30,000 Images - Natural Scenes OCR Data in Southeast Asian Languages

30,000 natural scene OCR data for minority languages in Southeast Asia, including Khmer (Cambodia), Lao and Burmese. The diversity of collection includes a variety of natural scenes and a variety of shooting angles. This set of data can be used for Southeast Asian language OCR tasks.

OCR Southeast Asian Languages Natural Scenes

5,000 Images of Turkish Natural Scene OCR Data

5,000 Turkish natural scenarios OCR data include a variety of natural scenarios and multiple shooting angles. For annotation, quadrilateral or polygon bounding box annotation and transcription for the texts were annotated in the data. This data can be used for tasks such as the Turkish language OCR.

OCR，Turkish，Natural scenes

8,604 Images of Arabic Natural Scene OCR Data

8,604 Arabic natural scenarios OCR data include a variety of natural scenarios and multiple shooting angles. For annotation, quadrilateral or polygon bounding box annotation and transcription for the texts were annotated in the data. This data can be used for tasks such as the Arabic language OCR.

Arabic Multiple natural scenes Multiple shooting angles

104,320 Images - Korean and Hindi OCR Data in Natural Scenes

104,320 Images - Korean and Hindi OCR Data in Natural Scenes. The collecting scenes of this dataset include packaging, posters, tickets, reminders, menus, building signs, etc.. The data diversity includes multiple scenes, multiple shooting angles and multiple light conditions. For annotation, line-level polygon bounding box (or tetragon bounding box, rectangle bounding box) annotation, transcription and text attributes (language type) for the texts; vertical-level polygon bounding box (or tetragon bounding box, rectangle bounding box) annotation, transcription and text attributes (language type) for the text. The dataset can be used for Korean and Hindi OCR tasks in natural scenes.

Multiple natural scenes Multiple shooting angles Multiple light conditions

57,645 Images - Vertical OCR Data in Text Scenes

57,645 Images - Vertical OCR Data in Text Scenes. The collecting scenes of this dataset include street scenes, plaques, billboards, posters, decorations, art lettering, magazine covers etc. The language distribution includes Chinese and a few English. In this dataset, vertical -level rectangular bounding box (polygonal bounding box, parallelogram bounding box) annotation and transcription for the texts; non-vertical rectangular bounding box (polygonal bounding box, parallelogram bounding box) annotation and transcription for the texts. This dataset can be used for tasks such as multiple vertical text scenes OCR.

OCR Multiple scenes Multiple fonts

105,941 Images Natural Scenes OCR Data of 12 Languages

105,941 Images Natural Scenes OCR Data of 12 Languages. The data covers 12 languages (6 Asian languages, 6 European languages), multiple natural scenes, multiple photographic angles. For annotation, line-level quadrilateral bounding box annotation and transcription for the texts were annotated in the data. The data can be used for tasks such as OCR of multi-language.

12 languages Multiple photographic angles Multiple scenes Line-level quadrilateral bounding box annotation and transcription

Tell Us Your Special Needs

Full Name *

Contact Phone No. *

Company name *

Company Email *

Data Requirements *

By submitting, I agree to the Privacy Protection

Submit

Subscribe to our newsletter

Be the first to receive Nexdata latest product releases, data solutions and enterprise news.

Off-the-Shelf Datasets: All Category Datasets; LLM Datasets; Computer Vision Datasets; Speech Recognition Datasets; Speech Synthesis Datasets; OCR Datasets; Pronunciation Dictionary; NLU Datasets

Data Service: 3D Point Cloud Data; Street View Data; OCR Data; Behavior Recognition Data; Identity Recognition Data; Speech Recognition Data; Speech Synthesis Data; Multimodal Data

Industries: Generative AI; Autonomous Vehicles; AR/VR; Conversational AI; Smart Home; Retail; Intelligent Healthcare

Company: About Us; News; Partners; Quality & Security; Event
Links: OPENMPD; DataPlus; Datarade

Platform: Platform
Competition: Competition
Resources: Sponsored Datasets

Sharpen Your AI with Better Data

+1(626)594-5598

[email protected]

Sitemap Terms and Conditions

We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

568c44eb-6535-4fd6-b548-75bddfb3915a

9ad13bb4-18ff-4ba0-a7d3-af3a1af6d87f

4,995 Vietnamese OCR Images Data - Images with Annotation and Transcription

Vietnamese OCR Multiple scenes Multiple angles Different light conditions

Vietnamese OCR

Multiple scenes

Multiple angles

Different light conditions