[{"@type":"PropertyValue","name":"Format","value":"16kHz,16bit,wav,mono channel"},{"@type":"PropertyValue","name":"Recording environment","value":"quiet indoor environment, normal environment(contains noise that does not affect recognition)"},{"@type":"PropertyValue","name":"Recording content","value":"Speakers will read and record based on the given texts, with each text containing at least 1 type of specified entity word: person, phone number, address, alphanumeric sequence, Email, product Model, product serial number, and money."},{"@type":"PropertyValue","name":"Country","value":"Europe"},{"@type":"PropertyValue","name":"Language","value":"Portuguese"},{"@type":"PropertyValue","name":"Accuracy","value":"WAR(Word Accuracy Rate) 98% (Punctuation, tags and non-speech annotations are subjective, thus they are excluded from the accuracy statistics.)"},{"@type":"PropertyValue","name":"Device","value":"Android phone, iPhone"}]
{"id":1956,"datatype":"1","titleimg":"https://www.nexdata.ai/shujutang/static/image/index/datatang_yuyin_default.webp","type1":"165","type1str":null,"type2":"166","type2str":null,"dataname":"108 Hours Portuguese Speech Dataset with Entity Annotations","datazy":[{"title":"Format","content":"16kHz,16bit,wav,mono channel"},{"title":"Recording environment","content":"quiet indoor environment, normal environment(contains noise that does not affect recognition)"},{"title":"Recording content","content":"Speakers will read and record based on the given texts, with each text containing at least 1 type of specified entity word: person, phone number, address, alphanumeric sequence, Email, product Model, product serial number, and money."},{"title":"Country","content":"Europe"},{"title":"Language","content":"Portuguese"},{"title":"Accuracy","content":"WAR(Word Accuracy Rate) 98% (Punctuation, tags and non-speech annotations are subjective, thus they are excluded from the accuracy statistics.)"},{"title":"Device","content":"Android phone, iPhone"}],"datatag":"Portuguese,Smartphone,Reading,Scripted Monologue,European","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":null,"samplePresentation":[{"name":"G00002T03P00107.wav","url":"https://storage-product.datatang.com/damp/product/instructions_zh/20260429175227/G00002T03P00107.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=yA4vzukjIB37ztV3AHFZ%2F2v%2FkB4%3D","intro":"Olá, gostava de fazer o acompanhamento do pedido para [LOC/]Largo do Rio, dezanove, Évora[/LOC], porque houve um pequeno problema com a entrega anterior. \\nOlá, gostava de fazer o acompanhamento do pedido para [LOC/]Largo do Rio, 19, Évora[/LOC], porque houve um pequeno problema com a entrega anterior.","size":257268,"progress":100,"type":"mp3"},{"name":"G00002T06P00080.wav","url":"https://storage-product.datatang.com/damp/product/instructions_zh/20260429175227/G00002T06P00080.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=UpJtwsfyPEx85bU%2BFJJH4ntR7wE%3D","intro":"É possível ajudar com o meu Monitor Philips da série [PROSER/]D R B dois Y seis B Q Z W N R I X três W[/PROSER] que está avariado?\\n É possível ajudar com o meu Monitor Philips da série [PROSER/]DRB2Y6BQZWNRIX3W[/PROSER] que está avariado?","size":397688,"progress":100,"type":"mp3"}],"officialSummary":"This Portuguese speech dataset covers a wide range of entity types—such as personal names, phone numbers, addresses, alphanumeric sequences, email addresses, product model numbers, product serial numbers, and monetary amounts—authentically reflecting real-life interaction scenarios, and includes corresponding transcriptions and other attribute information. Our dataset was collected from speakers with diverse geographical and background profiles, thereby enhancing the model's performance in real-world, complex tasks; the dataset has undergone quality validation by multiple AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.","dataexampl":null,"datakeyword":["enetity annotated speech dataset","speech dataset for ner","Portuguese speech dataset","portuguese ner dataset","entity recognition dataset"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Data Type,Language","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"speechRec","dataShowType":"[{\"code\":\"0\",\"language\":\"ZH\"},{\"code\":\"1\",\"language\":\"ZH\"},{\"code\":\"2\",\"language\":\"EN\"},{\"code\":\"3\",\"language\":\"EN\"},{\"code\":\"4\",\"language\":\"JP\"}]","productNameEn":"108 hours - Portuguese(Europe) Entities Scripted Monologue Smartphone speech dataset","BGimg":"brightSpot_audio","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"]}
108 Hours Portuguese Speech Dataset with Entity Annotations
enetity annotated speech dataset
speech dataset for ner
Portuguese speech dataset
portuguese ner dataset
entity recognition dataset
This Portuguese speech dataset covers a wide range of entity types—such as personal names, phone numbers, addresses, alphanumeric sequences, email addresses, product model numbers, product serial numbers, and monetary amounts—authentically reflecting real-life interaction scenarios, and includes corresponding transcriptions and other attribute information. Our dataset was collected from speakers with diverse geographical and background profiles, thereby enhancing the model's performance in real-world, complex tasks; the dataset has undergone quality validation by multiple AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Format
16kHz,16bit,wav,mono channel
Recording environment
quiet indoor environment, normal environment(contains noise that does not affect recognition)
Recording content
Speakers will read and record based on the given texts, with each text containing at least 1 type of specified entity word: person, phone number, address, alphanumeric sequence, Email, product Model, product serial number, and money.
Country
Europe
Language
Portuguese
Accuracy
WAR(Word Accuracy Rate) 98% (Punctuation, tags and non-speech annotations are subjective, thus they are excluded from the accuracy statistics.)
Device
Android phone, iPhone
Sample
Audio
Olá, gostava de fazer o acompanhamento do pedido para [LOC/]Largo do Rio, dezanove, Évora[/LOC], porque houve um pequeno problema com a entrega anterior. \nOlá, gostava de fazer o acompanhamento do pedido para [LOC/]Largo do Rio, 19, Évora[/LOC], porque houve um pequeno problema com a entrega anterior.
Audio
É possível ajudar com o meu Monitor Philips da série [PROSER/]D R B dois Y seis B Q Z W N R I X três W[/PROSER] que está avariado?\n É possível ajudar com o meu Monitor Philips da série [PROSER/]DRB2Y6BQZWNRIX3W[/PROSER] que está avariado?