[{"@type":"PropertyValue","name":"Data size","value":"1,000 images, including 500 images for basic editing and 500 images for professional editing"},{"@type":"PropertyValue","name":"Data diversity","value":"different invoice contents, different editing types, multiple invoice formats"},{"@type":"PropertyValue","name":"Device","value":"scanner"},{"@type":"PropertyValue","name":"Data format","value":"the data is stored in two formats: one is PDF format, and the other is JPG format (converted from PDF)"},{"@type":"PropertyValue","name":"Data requirement","value":"all sensitive fields—including company names, addresses, personal names, fax numbers, and telephone numbers—have been anonymized with synthetic data; no real-world identifiers remain"},{"@type":"PropertyValue","name":"Accuracy rata","value":"according to the collection requirements, the collection accuracy is not less than 95%"}]
{"id":1841,"datatype":"1","titleimg":"https://www.nexdata.ai/shujutang/static/image/index/datatang_tuxiang_default.webp","type1":"147","type1str":null,"type2":"150","type2str":null,"dataname":"1,000 Images – Japanese Invoice OCR Dataset","datazy":[{"title":"Data size","content":"1,000 images, including 500 images for basic editing and 500 images for professional editing"},{"title":"Data diversity","content":"different invoice contents, different editing types, multiple invoice formats"},{"title":"Device","content":"scanner"},{"title":"Data format","content":"the data is stored in two formats: one is PDF format, and the other is JPG format (converted from PDF)"},{"title":"Data requirement","content":"all sensitive fields—including company names, addresses, personal names, fax numbers, and telephone numbers—have been anonymized with synthetic data; no real-world identifiers remain"},{"title":"Accuracy rata","content":"according to the collection requirements, the collection accuracy is not less than 95%"}],"datatag":"B2B,Japanese Invoices","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":null,"samplePresentation":[{"name":"00001.jpg","url":"https://storage-product.datatang.com/damp/product/instructions_zh/20260107150937/00001.jpg?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=gOMj0y0fUBHEILEauhLW36gMW5s%3D","intro":"","size":156247,"progress":100,"type":"jpg"},{"name":"00002.jpg","url":"https://storage-product.datatang.com/damp/product/instructions_zh/20260107150937/00002.jpg?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=dOGcwHK7DAojM%2Btxo7%2BNxyJxzpU%3D","intro":"","size":186713,"progress":100,"type":"jpg"},{"name":"00539.jpg","url":"https://storage-product.datatang.com/damp/product/instructions_zh/20260107150937/00539.jpg?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=Y6n%2BuIp59k2YojIAEVVaKYPkX98%3D","intro":"","size":278489,"progress":100,"type":"jpg"}],"officialSummary":"This dataset contains 1,000 Japanese invoice images, it includes 500 images with basic virtual editing and 500 images with professional editing. Data diversity includes different invoice contents, different editing types, and multiple invoice formats. The company name, address, name, fax number, phone number and other sensitive information on the invoice have been virtually edited and are not real information. The data can be used for tasks such as invoice detection, recognition, and end-to-end OCR tasks.","dataexampl":null,"datakeyword":["Japanese invoice OCR dataset","invoice OCR dataset","invoice OCR data","invoice recognition dataset","OCR training data"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Data Type,Language","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"ocr","dataShowType":"[{\"code\":\"0\",\"language\":\"ZH\"},{\"code\":\"1\",\"language\":\"ZH\"},{\"code\":\"2\",\"language\":\"EN\"},{\"code\":\"3\",\"language\":\"EN\"},{\"code\":\"4\",\"language\":\"JP\"}]","productNameEn":"1,000 Images – Japanese Invoices Collection Data","BGimg":"","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"],"firstList":[{"name":"00612.jpg","url":"https://storage-product.datatang.com/damp/product/instructions_zh/20260107150937/00612.jpg?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=l34495mf4xzSfCM5s3D68%2FxzVts%3D","intro":"","size":404979,"progress":100,"type":"jpg"}]}
https://www.nexdata.ai/shujutang/static/image/index/datatang_tuxiang_default.webp
[{"@type":"ImageObject","embedUrl":"https://storage-product.datatang.com/damp/product/instructions_zh/20260107150937/00001.jpg?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=gOMj0y0fUBHEILEauhLW36gMW5s%3D"},{"@type":"ImageObject","embedUrl":"https://storage-product.datatang.com/damp/product/instructions_zh/20260107150937/00002.jpg?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=dOGcwHK7DAojM%2Btxo7%2BNxyJxzpU%3D"},{"@type":"ImageObject","embedUrl":"https://storage-product.datatang.com/damp/product/instructions_zh/20260107150937/00539.jpg?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=Y6n%2BuIp59k2YojIAEVVaKYPkX98%3D"},{"@type":"ImageObject","embedUrl":"https://storage-product.datatang.com/damp/product/instructions_zh/20260107150937/00612.jpg?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=l34495mf4xzSfCM5s3D68%2FxzVts%3D"}]
1,000 Images – Japanese Invoice OCR Dataset
Japanese invoice OCR dataset
invoice OCR dataset
invoice OCR data
invoice recognition dataset
OCR training data
This dataset contains 1,000 Japanese invoice images, it includes 500 images with basic virtual editing and 500 images with professional editing. Data diversity includes different invoice contents, different editing types, and multiple invoice formats. The company name, address, name, fax number, phone number and other sensitive information on the invoice have been virtually edited and are not real information. The data can be used for tasks such as invoice detection, recognition, and end-to-end OCR tasks.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
![Specifications]()
Specifications
Data size
1,000 images, including 500 images for basic editing and 500 images for professional editing
Data diversity
different invoice contents, different editing types, multiple invoice formats
Data format
the data is stored in two formats: one is PDF format, and the other is JPG format (converted from PDF)
Data requirement
all sensitive fields—including company names, addresses, personal names, fax numbers, and telephone numbers—have been anonymized with synthetic data; no real-world identifiers remain
Accuracy rata
according to the collection requirements, the collection accuracy is not less than 95%
![Sample]()
Sample
![Recommended Datasets]()
Recommended Dataset
Tell Us Your Special Needs
d54a1dad-f6f8-4873-9010-c1ef9f0bef0a