Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

Home > All Category Datasets > OCR Datasets > 9,574 Images – Multilingual Handwriting OCR Dataset (8 Languages)

9,574 Images – Multilingual Handwriting OCR Dataset (8 Languages)

handwriting OCR dataset

handwritten text recognition data

multi-language handwriting OCR data

OCR training data

polygon-annotated handwriting dataset

This dataset includes 9,574 handwriting images across 8 languages, including English, Spanish Portuguese and more. The data diversity includes multiple collecting scenes, different text carriers and different photographic angles(looking up, eye-level, looking down). In terms of annotation, each text line is annotated with quadrilateral polygons and transcription. The dataset can be used for training and evaluating OCR models, handwriting recognition systems, and multilingual text extraction tasks in AI and computer vision.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Specifications

Data size

9,574 images, 243,240 bounding boxes

Language distribution

English, Spanish, Portuguese, French, German, Japanese, Italian and Dutch

Collecting environment

black boards, white boards, green boards

Device

cellphone

Photographic angle

eye-level angle, looking down angle, looking up angle

Data format

the image data format is .jpg and other common image formats, the annotation file data format is.json

Annotation content

line-level quadrilateral (polygon) bounding box annotation and transcription for the texts

Accuracy rate

the error bound of each vertex of quadrilateral bounding box is within 5 pixels, which is a qualified annotation, the accuracy of bounding boxes is not less than 95%; the texts transcription accuracy is not less than 95%

Sample

Recommended Dataset

5,162 Images – Traditional Chinese Handwriting OCR Dataset

This dataset contains 5,162 handwriting images from 262 individuals, covering Traditional Chinese characters used in Taiwan. Each text in the data were annotated with quadrilateral bounding boxes. The handwriting ocr data can be used for training and evaluating OCR models, Traditional Chinese character recognition systems, and AI-based handwriting applications. The accuracy of line-level annotation and transcription is >= 97%.

Traditional Chinese handwriting OCR dataset handwriting OCR dataset for Traditional Chinese Traditional Chinese handwriting recognition

Japanese Handwriting OCR Dataset – 4,538 Handwritten Text Images

This dataset contains 4,538 Japanese handwritten text images collected from 101 individual writers, written on A4 paper. The dataset content including social livelihood, entertainment, tour, sport, movie, composition and other fields. For annotation, character-level rectangular bounding box annotation and text transcription and line-level rectangular bounding box annotation and text transcription were adopted. The dataset can be used for for training and evaluating Japanese handwriting OCR models, handwritten text recognition systems, and document understanding pipelines.

handwriting OCR dataset handwritten OCR dataset handwriting recognition dataset Japanese handwriting OCR dataset

Handwriting OCR Dataset – Japanese and Korean (22,163 Images)

This dataset contains handwritten text images collected from 100 individuals, including 50 Japanese, 49 Koreans and 1 Afghan. For different subjects, the corpus are different. The data diversity includes multiple cellphone models and different corpus. This dataset can be used for tasks such as handwriting OCR models, handwritten text recognition systems, and multilingual OCR pipelines

handwriting OCR dataset handwritten OCR dataset handwriting recognition dataset Korean handwriting OCR dataset multilingual handwriting OCR dataset Japanese handwriting OCR dataset

1,000 People - Italian Handwriting OCR Dataset

The writers are Europeans who often write Italian. The device is scanner, the collection angle is eye-level angle. The dataset content includes address, company name, personal name.The dataset can be used for tasks such as Italian OCR models and handwritten text recognition systems.

Italian ocr dataset Italian handwriting ocr dataset Italian HTR Dataset OCR training dataset

1,000 People - Spanish Handwriting OCR Dataset

The writers are Europeans who often write spanish. The device is scanner, the collection angle is eye-level angle. The dataset content includes address, company name, personal name.The dataset can be used for Spanish OCR models and handwritten text recognition systems.

OCR training dataset Spanish handwriting ocr dataset Spanish ocr dataset Spanish HTR Dataset

1,000 People - German Handwriting OCR Dataset

The writers are Europeans who often write German. The device is scanner, the collection angle is eye-level angle. The dataset content includes address, company name, personal name.The dataset can be used for tasks such as German OCR models and handwritten text recognition systems.

German ocr dataset German handwriting ocr dataset German HTR Dataset OCR training dataset

1,000 People - French Handwriting OCR Dataset

The writers are Europeans who often write French. The device is scanner, the collection angle is eye-level angle. The dataset content includes address, company name, personal name.The dataset can be used for tasks such as French OCR models and handwritten text recognition systems.

French HTR Dataset French ocr dataset French handwriting ocr dataset OCR training dataset

222,522 Images – Chinese Handwriting OCR Data

222,522 Images – Chinese Handwriting OCR Data. The writing environment includes A4 paper, square paper, lined paper, white board, color note, answer sheet, etc. The writing contents include poetry, prose, store activity notices, greetings, wish lists, excerpts,composition, notes, etc. The data diversity includes multiple writing papers, multiple fonts, multiple writing contents, multiple photographic angles. The collecting angeles are looking up angle and eye-level angle. For annotation, line-level/column-level quadrilateral bounding box annotation and transcription for the texts were annotated in the data. The dataset can be used for tasks such as Chinese handwriting OCR.

Multiple writing papers Multiple fonts Multiple writing contents Multiple photographic angles

Tell Us Your Special Needs

Current Project Maturity

Early exploration (no concrete specs yet)

Defined goals, need professional guidance

Active development or optimization phase

Data & labeling experts with clear specifications

Full Name *

Contact Phone No.*

Company name *

Company Email *

Data Requirements *

By submitting, I agree to the Privacy Protection

Submit

Subscribe to our newsletter

Be the first to receive Nexdata latest product releases, data solutions and enterprise news.

Off-the-Shelf Datasets: All Category Datasets; Embodied AI Datasets; LLM Datasets; Computer Vision Datasets; Speech Recognition Datasets; Speech Synthesis Datasets; OCR Datasets; Pronunciation Dictionary; NLU Datasets

Data Service: 3D Point Cloud Data; Street View Data; OCR Data; Behavior Recognition Data; Identity Recognition Data; Speech Recognition Data; Speech Synthesis Data; Multimodal Data

Industries: Embodied AI; Generative AI; Autonomous Vehicles; AR/VR; Conversational AI; Smart Home; Retail; Intelligent Healthcare

Company: About Us; News; Partners; Quality & Security; Event
Links: OPENMPD; DataPlus; Datarade

Platform: Platform
Competition: Competition
Resources: Sponsored Datasets

Sharpen Your AI with Better Data

+1(626)594-5598

[email protected]

Sitemap Terms and Conditions

We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

838023f3-ac6e-4869-bd50-7c64dc59f1be

a38af55d-d385-474e-92ee-2059a497e854

9,574 Images – Multilingual Handwriting OCR Dataset (8 Languages)

handwriting OCR dataset handwritten text recognition data multi-language handwriting OCR data OCR training data polygon-annotated handwriting dataset

Current Project Maturity

handwriting OCR dataset

handwritten text recognition data

multi-language handwriting OCR data

OCR training data

polygon-annotated handwriting dataset