en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

OCR Datasets

Instantly enhance AI model performance with high quality off-the-shelf datasets.

Data Type

All
30
Document
2
General Scenario
13
Handwriting
15
Internet image
3
Invoice
2
Others
4
Test paper
2
Table
1

Language

All
30
Chinese
8
English
4
Hindi
4
Japanese
7
Korean
7
Others
21
Vietnamese
4

14,511 Images English Handwriting OCR Dataset

The text carrier are A4 paper, lined paper, English paper, etc. The device is cellphone, the collection angle is eye-level angle. The dataset content includes English composition, poetry, prose, news, stories, etc. For annotation, line-level quadrilateral bounding box annotation and transcription for the texts were annotated in the data.The dataset can be used for tasks such as English handwriting OCR.
English ocr dataset English handwriting ocr dataset English HTR Dataset OCR training dataset

5,147 Images Japanese Handwriting OCR dataset

The text carrier are A4 paper, lined paper, quadrille paper, etc. The device is cellphone, the collection angle is eye-level angle. The dataset content includes Japanese composition, poetry, prose, news, stories, etc. For annotation, line-level quadrilateral bounding box annotation and transcription for the texts were annotated in the data.The dataset can be used for tasks such as Japanese OCR models and handwritten text recognition systems.
Japanese ocr dataset Japanese handwriting ocr dataset Japanese HTR Dataset OCR training dataset

1,000 People - Italian Handwriting OCR Dataset

The writers are Europeans who often write Italian. The device is scanner, the collection angle is eye-level angle. The dataset content includes address, company name, personal name.The dataset can be used for tasks such as Italian OCR models and handwritten text recognition systems.
Italian ocr dataset Italian handwriting ocr dataset Italian HTR Dataset OCR training dataset

1,000 People - German Handwriting OCR Dataset

The writers are Europeans who often write German. The device is scanner, the collection angle is eye-level angle. The dataset content includes address, company name, personal name.The dataset can be used for tasks such as German OCR models and handwritten text recognition systems.
German ocr dataset German handwriting ocr dataset German HTR Dataset OCR training dataset

1,000 People - French Handwriting OCR Dataset

The writers are Europeans who often write French. The device is scanner, the collection angle is eye-level angle. The dataset content includes address, company name, personal name.The dataset can be used for tasks such as French OCR models and handwritten text recognition systems.
French HTR Dataset French ocr dataset French handwriting ocr dataset OCR training dataset

5,711 Images Korean Handwriting OCR Dataset

The text carrier are A4 paper, lined paper, quadrille paper, etc. The device is cellphone, the collection angle is eye-level angle. The dataset content includes Korean composition, poetry, prose, news, stories, etc. For annotation, line-level quadrilateral bounding box annotation and transcription for the texts were annotated in the data. The dataset can be used for tasks such as Korean OCR models and handwritten text recognition systems.
OCR training dataset Korean HTR Dataset Korean ocr dataset Korean handwriting ocr dataset

1,000 People - Spanish Handwriting OCR Dataset

The writers are Europeans who often write spanish. The device is scanner, the collection angle is eye-level angle. The dataset content includes address, company name, personal name.The dataset can be used for Spanish OCR models and handwritten text recognition systems.
OCR training dataset Spanish handwriting ocr dataset Spanish ocr dataset Spanish HTR Dataset

14,980 PPT Images – Multilingual OCR Dataset (8 Languages)

This dataset contains 14,980 PowerPoint slide images across 8 languages(French, Korean, Japanese, Spanish, German, Italian, Portuguese and Russian). This dataset includes multiple scenes, different photographic angles & distances, different light conditions. For annotation, each text line was annotated with quadrilateral bounding boxs and transcribed. The dataset can be used for tasks such as developing multilingual OCR systems.
multilingual PPT OCR dataset PowerPoint OCR dataset for AI OCR training dataset AI dataset for PowerPoint text extraction

5,162 Images – Traditional Chinese Handwriting OCR Dataset

This dataset contains 5,162 handwriting images from 262 individuals, covering Traditional Chinese characters used in Taiwan. Each text in the data were annotated with quadrilateral bounding boxes. The handwriting ocr data can be used for training and evaluating OCR models, Traditional Chinese character recognition systems, and AI-based handwriting applications. The accuracy of line-level annotation and transcription is >= 97%.
Traditional Chinese handwriting OCR dataset handwriting OCR dataset for Traditional Chinese Traditional Chinese handwriting recognition

loading

Tailor Your Data Now

Why off-the-shelf Datasets

  • Copyright

    Copyright

    Clear Coyright and Ready to Check
  • Security

    Security

    Properly Authorized Secure to Use
  • Professional

    Professional

    Designed and produced by AI data experts
  • Diversity

    Diversity

    Collected from a varity of real scenes
  • Cost Effective

    Cost Effective

    More Cost-Efficient Than Tailored Data
  • Efficiency

    Efficiency

    Ready-To-Go Deliver in Seconds
e13c5b11-f47a-49c5-9b0b-b5b015fe340c