en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

108 Hours Portuguese Speech Dataset with Entity Annotations

enetity annotated speech dataset
speech dataset for ner
Portuguese speech dataset
portuguese ner dataset
entity recognition dataset

This Portuguese speech dataset covers a wide range of entity types—such as personal names, phone numbers, addresses, alphanumeric sequences, email addresses, product model numbers, product serial numbers, and monetary amounts—authentically reflecting real-life interaction scenarios, and includes corresponding transcriptions and other attribute information. Our dataset was collected from speakers with diverse geographical and background profiles, thereby enhancing the model's performance in real-world, complex tasks; the dataset has undergone quality validation by multiple AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Format
16kHz,16bit,wav,mono channel
Recording environment
quiet indoor environment, normal environment(contains noise that does not affect recognition)
Recording content
Speakers will read and record based on the given texts, with each text containing at least 1 type of specified entity word: person, phone number, address, alphanumeric sequence, Email, product Model, product serial number, and money.
Country
Europe
Language
Portuguese
Accuracy
WAR(Word Accuracy Rate) 98% (Punctuation, tags and non-speech annotations are subjective, thus they are excluded from the accuracy statistics.)
Device
Android phone, iPhone
Sample Sample
  • Audio

    Olá, gostava de fazer o acompanhamento do pedido para [LOC/]Largo do Rio, dezanove, Évora[/LOC], porque houve um pequeno problema com a entrega anterior. \nOlá, gostava de fazer o acompanhamento do pedido para [LOC/]Largo do Rio, 19, Évora[/LOC], porque houve um pequeno problema com a entrega anterior.

  • Audio

    É possível ajudar com o meu Monitor Philips da série [PROSER/]D R B dois Y seis B Q Z W N R I X três W[/PROSER] que está avariado?\n É possível ajudar com o meu Monitor Philips da série [PROSER/]DRB2Y6BQZWNRIX3W[/PROSER] que está avariado?

Recommended DatasetsRecommended Dataset
Tell Us Your Special Needs

Current Project Maturity

Early exploration (no concrete specs yet)
Defined goals, need professional guidance
Active development or optimization phase
Data & labeling experts with clear specifications

By submitting, I agree to the Privacy Protection

0f792174-f748-4b9d-acb4-db96ce619e13

8859a5cd-c1b1-4de0-8a23-9f78fdf8b6c4