en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

849 Hours - Arabic(Saudi Arabia) Real-world Casual Conversation and Monologue speech dataset

Arabic colloquial speech data
Arabic colloquial video
Arabic multimodal data
Arabic natural dialogue data
Saudi Arabian natural dialogue data
Saudi Arabian multimodal data
multimodal data

Arabic(Saudi Arabia) Real-world Casual Conversation and Monologue speech dataset, covers Interview, variety show, live, etc, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Format
16kHz, 16 bit, wav, mono channel;
Content category
Including interview, variety show, live, etc;
Recording environment
Low background noise;
Country
Saudi Arabia(SAU);
Language(Region) Code
ar-SA;
Language
Arabic;
Features of annotation
Transcription text, timestamp, speaker ID, gender.
Accuracy Rate
Sentence Accuracy Rate (SAR) 95%
Sample Sample
  • Audio

    اليهودية، والمسحية، والهندوسية اللي ما عمرنا شفنا حد منهم ينتقدها او يتكلم عنها اصلا، الرمي كله للأسلام شفتوا الدليل،

  • Audio

    طبعا، احكي شلون تغى تجدد الخطاب الديني؟ وش، تبغى تجدد الخطاب الديني، تبي تغير نصوص القرآن،

  • Audio

    اكتشفت شئ يعني والعياذ بالله ما كان يعرفه النبي عليه الصلاة والسلام،

  • Audio

    ما ابغى اقول اكثر من كذا صراحة ما اقدر حتى أقولها كا كمثال وإلا كنكتة،

  • Audio

    يعني شفتوه انتم وسمعتوه، المذيعة تقول بما في ذلك النص القرآني رمي للأسلام، راحت تقول له طبعا،

Recommended DatasetsRecommended Dataset
97 Hours - German(Germany) Children Real-world Casual Conversation and Monologue speech dataset

German(Germany) Children Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live, lecture, variety show and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spontaneous Speech text annotation German
548 Hours - Taiwanese Accent Mandarin(China) Real-world Casual Conversation and Monologue speech dataset

Taiwanese Accent Mandarin(China) Real-world Casual Conversation and Monologue speech dataset, covers self-media, conversation, live and other generic domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Chinese spoken video voice data Chinese voice data Chinese spoken video data Chinese multimodal data
127 Hours - Malay(Malaysia) Spontaneous Dialogue Smartphone speech dataset

Malay(Malaysia) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(142 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

audio data dataset conversational asr data Malay
143 Hours - Uyghur(China) Spontaneous Dialogue Telephony speech dataset

Uyghur(China) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(320 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

audio data dataset conversational asr data Uyghur Uyghur discuss data Uyghur discuss dataset Uyghur discuss collection Uyghur small talk data Uyghur small talk dataset Uyghur small talk collection Uyghur conversational data Uyghur conversational dataset Uyghur conversational collection Uyghur chat data Uyghur chat dataset Uyghur chat collection Uyghur communication data Uyghur communication dataset Uyghur communication collection Uyghur speech data Uyghur speech dataset Uyghur speech collection Uyghur talk data Uyghur talk dataset Uyghur talk collection Uyghur conversation data Uyghur conversation dataset Uyghur conversation collection Uighurs discuss data Uighurs discuss dataset Uighurs discuss collection Uighurs small talk data Uighurs small talk dataset Uighurs small talk collection Uighurs conversational data Uighurs conversational dataset Uighurs conversational collection Uighurs chat data Uighurs chat dataset Uighurs chat collection Uighurs communication data Uighurs communication dataset Uighurs communication collection Uighurs speech data Uighurs speech dataset Uighurs speech collection Uighurs talk data Uighurs talk dataset Uighurs talk collection Uighurs conversation data Uighurs conversation dataset Uighurs conversation collection Uygurs discuss data Uygurs discuss dataset Uygurs discuss collection Uygurs small talk data Uygurs small talk dataset Uygurs small talk collection Uygurs conversational data Uygurs conversational dataset Uygurs conversational collection Uygurs chat data Uygurs chat dataset Uygurs chat collection Uygurs communication data Uygurs communication dataset Uygurs communication collection Uygurs speech data Uygurs speech dataset Uygurs speech collection Uygurs talk data Uygurs talk dataset Uygurs talk collection Uygurs conversation data Uygurs conversation dataset Uygurs conversation collection Uigurs discuss data Uigurs discuss dataset Uigurs discuss collection Uigurs small talk data Uigurs small talk dataset Uigurs small talk collection Uigurs conversational data Uigurs conversational dataset Uigurs conversational collection Uigurs chat data Uigurs chat dataset Uigurs chat collection Uigurs communication data Uigurs communication dataset Uigurs communication collection Uigurs speech data Uigurs speech dataset Uigurs speech collection Uigurs talk data Uigurs talk dataset Uigurs talk collection Uigurs conversation data Uigurs conversation dataset Uigurs conversation collection
212 Hours - Burmese(Myanmar) Real-world Casual Conversation and Monologue speech dataset

Burmese(Myanmar) Real-world Casual Conversation and Monologue speech dataset, covers service, conversation, interview domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Burmese Spontaneous Speech
503 Hours - Russian(Russia) Real-world Casual Conversation and Monologue speech dataset

Russian(Russia) Real-world Casual Conversation and Monologue speech dataset, covers education, interview, sports domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

russia Spontaneous Speech Russian
396 Hours - Korean(Korea) Real-world Casual Conversation and Monologue speech dataset

Korean(Korea) Real-world Casual Conversation and Monologue speech dataset, covers live, variety-show, speech domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spontaneous Speech korean
494 Hours - Hindi(India) Real-world Casual Conversation and Monologue speech dataset

Hindi(India) Real-world Casual Conversation and Monologue speech dataset, covers education, interview, sports domains, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spontaneous Speech Hindi
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

5814e0eb-7539-4285-9850-a2ac96624ea8

aed763a3-b2f8-43e6-8be5-cde7689a42fc