en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

OCR Datasets

Instantly enhance AI model performance with high quality off-the-shelf datasets.

Data Type

All
30
Document
2
General Scenario
13
Handwriting
15
Internet image
3
Invoice
2
Others
4
Test paper
2
Table
1

Language

All
30
Chinese
8
English
4
Hindi
4
Japanese
7
Korean
7
Others
21
Vietnamese
4

500,000 Images – Multilingual OCR Dataset in 21 Languages

This dataset covers 21 languages, with 20,000 to 25,000 images per language. The data includes natural scenes, document photography scenes, and electronic scenes. The data diversity includes various data types, multiple shooting angles, and multiple languages. In terms of annotation, quadrilateral or polygonal at the row (column) level and content transcription at the row (column) level are adopted. This dataset can be use for multilingual optical character recognition (OCR) and text detection tasks.
multilingual OCR dataset scene text recognition data document OCR dataset electronic screen OCR data OCR dataset 21 languages AI OCR training data text recognition dataset

14,511 Images English Handwriting OCR Dataset

The text carrier are A4 paper, lined paper, English paper, etc. The device is cellphone, the collection angle is eye-level angle. The dataset content includes English composition, poetry, prose, news, stories, etc. For annotation, line-level quadrilateral bounding box annotation and transcription for the texts were annotated in the data.The dataset can be used for tasks such as English handwriting OCR.
English ocr dataset English handwriting ocr dataset English HTR Dataset OCR training dataset

5,147 Images Japanese Handwriting OCR dataset

The text carrier are A4 paper, lined paper, quadrille paper, etc. The device is cellphone, the collection angle is eye-level angle. The dataset content includes Japanese composition, poetry, prose, news, stories, etc. For annotation, line-level quadrilateral bounding box annotation and transcription for the texts were annotated in the data.The dataset can be used for tasks such as Japanese OCR models and handwritten text recognition systems.
Japanese ocr dataset Japanese handwriting ocr dataset Japanese HTR Dataset OCR training dataset

1,000 People - Italian Handwriting OCR Dataset

The writers are Europeans who often write Italian. The device is scanner, the collection angle is eye-level angle. The dataset content includes address, company name, personal name.The dataset can be used for tasks such as Italian OCR models and handwritten text recognition systems.
Italian ocr dataset Italian handwriting ocr dataset Italian HTR Dataset OCR training dataset

1,000 People - German Handwriting OCR Dataset

The writers are Europeans who often write German. The device is scanner, the collection angle is eye-level angle. The dataset content includes address, company name, personal name.The dataset can be used for tasks such as German OCR models and handwritten text recognition systems.
German ocr dataset German handwriting ocr dataset German HTR Dataset OCR training dataset

1,000 People - French Handwriting OCR Dataset

The writers are Europeans who often write French. The device is scanner, the collection angle is eye-level angle. The dataset content includes address, company name, personal name.The dataset can be used for tasks such as French OCR models and handwritten text recognition systems.
French HTR Dataset French ocr dataset French handwriting ocr dataset OCR training dataset

5,711 Images Korean Handwriting OCR Dataset

The text carrier are A4 paper, lined paper, quadrille paper, etc. The device is cellphone, the collection angle is eye-level angle. The dataset content includes Korean composition, poetry, prose, news, stories, etc. For annotation, line-level quadrilateral bounding box annotation and transcription for the texts were annotated in the data. The dataset can be used for tasks such as Korean OCR models and handwritten text recognition systems.
OCR training dataset Korean HTR Dataset Korean ocr dataset Korean handwriting ocr dataset

1,000 People - Spanish Handwriting OCR Dataset

The writers are Europeans who often write spanish. The device is scanner, the collection angle is eye-level angle. The dataset content includes address, company name, personal name.The dataset can be used for Spanish OCR models and handwritten text recognition systems.
OCR training dataset Spanish handwriting ocr dataset Spanish ocr dataset Spanish HTR Dataset

14,980 PPT Images – Multilingual OCR Dataset (8 Languages)

This dataset contains 14,980 PowerPoint slide images across 8 languages(French, Korean, Japanese, Spanish, German, Italian, Portuguese and Russian). This dataset includes multiple scenes, different photographic angles & distances, different light conditions. For annotation, each text line was annotated with quadrilateral bounding boxs and transcribed. The dataset can be used for tasks such as developing multilingual OCR systems.
multilingual PPT OCR dataset PowerPoint OCR dataset for AI OCR training dataset AI dataset for PowerPoint text extraction

loading

Tailor Your Data Now

Why off-the-shelf Datasets

  • Copyright

    Copyright

    Clear Coyright and Ready to Check
  • Security

    Security

    Properly Authorized Secure to Use
  • Professional

    Professional

    Designed and produced by AI data experts
  • Diversity

    Diversity

    Collected from a varity of real scenes
  • Cost Effective

    Cost Effective

    More Cost-Efficient Than Tailored Data
  • Efficiency

    Efficiency

    Ready-To-Go Deliver in Seconds
9c5e2e47-8332-4e49-854a-bae5d8861d4f