[{"@type":"PropertyValue","name":"Storage format","value":"TXT"},{"@type":"PropertyValue","name":"Data content","value":"Chinese-English Parallel Corpus Data"},{"@type":"PropertyValue","name":"Data size","value":"80.12 million pairs of Chinese-English Parallel Corpus Data."},{"@type":"PropertyValue","name":"Language","value":"Chinese, English"},{"@type":"PropertyValue","name":"Application scenario","value":"machine translation"}]
{"id":147,"datatype":"1","titleimg":"https://res.datatang.com/asset/productNew/APY170101223.png?Expires=2007353638&OSSAccessKeyId=LTAI5tQwXnJZbubgVfVa1ep9&Signature=uu%2BOjBbZoOeVHqwYc1zHqgdwXhE%3D","type1":"183","type1str":null,"type2":"183","type2str":null,"dataname":"80,120,000 Groups – Chinese-English Parallel Corpus Data","datazy":[{"title":"Storage format","value":"TXT"},{"title":"Data content","value":"Chinese-English Parallel Corpus Data"},{"title":"Data size","value":"80.12 million pairs of Chinese-English Parallel Corpus Data."},{"title":"Language","value":"Chinese, English"},{"title":"Application scenario","value":"machine translation"}],"datatag":"Chinese-English,Parallel Corpus","technologydoc":null,"downurl":null,"datainfo":"The 5.14 million sets of Chinese-English parallel corpora, covering tourism, medicine, daily scenario, TV drama and other fields, each set with 4-25 words, excluding political, pornography, personal privacy information and other sensitive words. As the basic corpus of text-based data analysis, it can be used in the field of machine translation.","standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":["3,062,170pairs","Chinese, English","4-25 words for each pair"],"samplePresentation":["jpg","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY170101223_demo1709805600140/APY170101223-demo/zh-en%20%3F%3F%3F%3F.png?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=hV7rToYxKOSn4Bomdmt%2Bqtp2maY%3D","/data/apps/damp/temp/ziptemp/APY170101223_demo1709805600140/APY170101223-demo/zh-en ????.png",""],"officialSummary":"Parallel translation corpus between Chinese and English. It is stored in txt files. It covers files like travel, medicine, daily and TV play. Data cleaning, desensitization, and quality inspection have been carried out. It can be used as the basic corpus database in text data file as well as used in machine translation.","dataexampl":"","datakeyword":["Chinese-English Parallel Corpus Data"," Chinese-English Alignment"," Corpus"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"nlu","BGimg":"","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"],"single":"yes"}
80,120,000 Groups – Chinese-English Parallel Corpus Data
Chinese-English Parallel Corpus Data
Chinese-English Alignment
Corpus
Parallel translation corpus between Chinese and English. It is stored in txt files. It covers files like travel, medicine, daily and TV play. Data cleaning, desensitization, and quality inspection have been carried out. It can be used as the basic corpus database in text data file as well as used in machine translation.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Storage format
TXT
Data content
Chinese-English Parallel Corpus Data
Data size
80.12 million pairs of Chinese-English Parallel Corpus Data.
Language
Chinese, English
Application scenario
machine translation
Sample
Recommended Dataset
5,310,000 Groups – Chinese-Germany Parallel Corpus Data
5.14 Million Pairs of Sentences - Chinese-Germany Parallel Corpus Data be stored in text format. It covers multiple fields such as tourism, medical treatment, daily life, news, etc. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.
Chinese - Germany Parallel Corpus Data Chinese -Germany Parallel Corpus Parallel Corpus Data Alignment Corpus Data
7,440,000 Groups – Chinese-Hindi Parallel Corpus Data
7.44 Million Pairs of Sentences - Chinese-Hindi Parallel Corpus Data be stored in text format. It covers multiple fields such as tourism, medical treatment, daily life, news, etc. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.
Chinese -Hindi Parallel Corpus Data Chinese -Hindi Parallel Corpus Parallel Corpus Data Alignment Corpus Data
1,080,000 Groups – English-Russian Parallel Corpus Data
English and Russian parallel corpus, 1,080,000 groups in total; excluded political, porn, personal information and other sensitive vocabulary; it can be a base corpus for text-based data analysis, used in machine translation and other fields.
English and Russian parallel corpus data English and Russian corpus collection English Russian parallel corpus Parallel Corpus Data Alignment Corpus Data
1,000,000 Groups - Chinese-Russian Parallel Corpus Data
1 Million Pairs of Sentences - Chinese-Russian Parallel Corpus Data be stored in .txt format. It covers multiple fields such as tourism, medical treatment, daily life, TV play, etc. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.
Chinese-Russian parallel corpus data Chinese-Russian alignment Parallel Corpus Data Alignment Corpus Data
6,020,000 Groups - Chinese-French Parallel Corpus Data
1 Million Pairs of Sentences - Chinese-French Parallel Corpus Data be stored in txt format. It covers multiple fields such as tourism, medical treatment, daily life, TV play, etc. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.
Chinese-French parallel corpus data Chinese-French alignment Parallel Corpus Data Alignment Corpus Data
9,830,000 Groups - Chinese-Japanese Parallel Corpus Data
9.83 Million Pairs of Sentences - Chinese-Japanese Parallel Corpus Data be stored in txt format. It covers multiple fields including general, IT, news, patent, and international engine. The data desensitization and quality checking had been done. It can be used as a basic corpus for text data analysis in fields such as machine translation.
Chinese-Japanese parallel corpus Chinese-Japanese alignment Parallel Corpus Data Alignment Corpus Data
380,000 Groups - Uighur-Chinese Parallel Corpus Data
Uighur language and its parallel corresponding Chinese text data, 38,000 groups in total. They been cleaned, desensitized and gone through quality check. It can be used as base corpus for text data analysis in machine translation and related fields.
Parallel corpus Uighur corpus machine translation
1,340,000 Groups – English-Korean Parallel Corpus Data
English and Korean parallel corpus, 1340,000 groups in total; excluded political, porn, personal information and other sensitive vocabulary; it can be a base corpus for text-based data analysis, used in machine translation and other fields.
English and Korean parallel corpus data English and Korean corpus collection Alignment Corpus Parallel Corpus Data Alignment Corpus Data