en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

Tamil Speech Dataset – 500 Hours Monologue Audio Corpus

Tamil speech dataset
Tamil audio dataset
Tamil language dataset
Tamil monologue dataset
Tamil voice corpus
Tamil ASR data
scripted speech in Tamil
smartphone Tamil dataset
speech recognition Tamil dataset
multilingual speech data

This dataset includes 500 hours of scripted Tamil monologue speech collected using smartphones. Each sample is transcribed with text content and metadata such as speaker ID, gender, and age. The dataset features diverse speakers from various regions, making it highly representative of real-world Tamil language use and suitable for automatic speech recognition (ASR), text-to-speech (TTS), voice activity detection (VAD), and natural language processing (NLP) tasks. Validated by leading AI companies, the dataset is designed to enhance model robustness in multilingual environments and low-resource languages. All data was collected in full compliance with global privacy regulations including GDPR, CCPA, and PIPL, ensuring ethical sourcing and responsible AI development.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Format
16kHz, 16bit, uncompressed wav, mono channel.
Recording condition
quiet indoor environment, low background noise, without echo;
Recording device
Android smartphone, iPhone;
Speaker
About 500 people
Language
Tamil;
Features of annotation
Transcription text;
Accuracy Rate
Word Accuracy Rate (WAR) 95%;
Sample Sample
  • Audio

    ஒவ்வொரு மாணவர்களின் வளர்ச்சிக்கும் பள்ளிக்கூடம் மிகவும் அவசியமானது.

  • Audio

    எனது தமிழ் பாடப்புத்தகத்தில் சரியா அல்லது தவறா கேள்விகள் கேட்கப்பட்டுள்ளது.

  • Audio

    சீன வாய்மொழி கற்றுக்கொள்ள ஆசை.

  • Audio

    பாடத்திட்டத்தில் கணிதம் எனக்கு மிகவும் பிடிக்கும்.

  • Audio

    பாடத்திட்டத்தில் அந்நிய மொழிகளை தவிர்க்க வேண்டும்.

Recommended DatasetsRecommended Dataset
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

28d7e3f6-5c73-4457-a281-8b1a9a4843c7

506543cf-d866-479a-86a1-11d6e585d2d1