[{"@type":"PropertyValue","name":"Data content","value":"200,475 sentences of text were transcribed in Chinese characters;"},{"@type":"PropertyValue","name":"Data scale","value":"200,475 original texts with 457,832 annotations;"},{"@type":"PropertyValue","name":"Content source","value":"Sentences extracted from various types of news, articles, novels, etc."},{"@type":"PropertyValue","name":"Language","value":"Chinese;"},{"@type":"PropertyValue","name":"Annotation","value":"Annotate the special symbols and Arabic numerals in the sentences as Chinese characters;"},{"@type":"PropertyValue","name":"Applications","value":"TTS, Text normalization;"}]
{"id":1102,"datatype":"1","titleimg":"https://res.datatang.com/asset/productNew/APY210430001.png?Expires=2007353690&OSSAccessKeyId=LTAI5tQwXnJZbubgVfVa1ep9&Signature=iYLubaJsdu%2BkGUK9Vx9rTfWGG6g%3D","type1":"165","type1str":null,"type2":"165","type2str":null,"dataname":"200,475 Sentences - Chinese Text Normalization Data","datazy":[{"title":"Data content","value":"200,475 sentences of text were transcribed in Chinese characters;"},{"title":"Data scale","value":"200,475 original texts with 457,832 annotations;"},{"title":"Content source","value":"Sentences extracted from various types of news, articles, novels, etc."},{"title":"Language","value":"Chinese;"},{"title":"Annotation","value":"Annotate the special symbols and Arabic numerals in the sentences as Chinese characters;"},{"title":"Applications","value":"TTS, Text normalization;"}],"datatag":"TN,TTS,Text Normalization","technologydoc":null,"downurl":null,"datainfo":"","standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":"","samplePresentation":["jpg","https://bj-oss-datatang-03.oss-cn-beijing.aliyuncs.com/filesInfoUpload/data/apps/damp/temp/ziptemp/APY210430001_demo1711360879318/APY210430001_demo/20210927171813646_demo.jpg?Expires=4102329599&OSSAccessKeyId=LTAI8NWs2pDolLNH&Signature=95%2FPPMl0M9RjZnQHrGFRikiiJ5k%3D","/data/apps/damp/temp/ziptemp/APY210430001_demo1711360879318/APY210430001_demo/20210927171813646_demo.jpg",""],"officialSummary":"200,475 Sentences - Chinese Text Normalization Data. Annotate the special symbols and Arabic numerals in the sentences as Chinese characters.","dataexampl":"","datakeyword":["TN data"," text regularized data"," speech synthesis data"," speech synthesis data set"," speech synthesis data"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Voice Type,Language","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"speechSyn","BGimg":"brightSpot_audio","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"],"single":"yes"}
200,475 Sentences - Chinese Text Normalization Data
TN data
text regularized data
speech synthesis data
speech synthesis data set
speech synthesis data
200,475 Sentences - Chinese Text Normalization Data. Annotate the special symbols and Arabic numerals in the sentences as Chinese characters.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Data content
200,475 sentences of text were transcribed in Chinese characters;
Data scale
200,475 original texts with 457,832 annotations;
Content source
Sentences extracted from various types of news, articles, novels, etc.
Language
Chinese;
Annotation
Annotate the special symbols and Arabic numerals in the sentences as Chinese characters;
Applications
TTS, Text normalization;
Sample
Recommended Dataset
319,977 Sentences - Mandarin Polyphone Corpus Data
The Mandarin Polyphone Corpus Data is designed for polyphone disambiguation. It includes 603 common Mandarin pinyin pronunciations, There are differences in the number of phonetic corpora according to the number of phrases in a single word.
Chinese Polysyllabic Corpus Chinese polyphone corpusChinese corpus
200,955 Sentences - Mandarin Prosodic Corpus Data
4 prosodic hierarchies annotating for the 200000 carefully selected Chinese texts which involve news and colloquial sentences. The sentence length is appropriate with diversified sentence patterns. This can be used as a TTS front-end prosody prediction training data set.
Prosodic annotation of Chinese text prosodic corpus of Chinese text news prosodic annotation