[{"@type":"PropertyValue","name":"Format","value":"16kHz, 16 bit, wav, mono channel;"},{"@type":"PropertyValue","name":"Content category","value":"Recorders in free conversation without a set topic;"},{"@type":"PropertyValue","name":"Recording condition","value":"Low background noise (indoor);"},{"@type":"PropertyValue","name":"Recording device","value":"Android smartphone, iPhone;"},{"@type":"PropertyValue","name":"Language","value":"Arabic;"},{"@type":"PropertyValue","name":"Features of annotation","value":"Transcription text, timestamp, speaker ID, gender."},{"@type":"PropertyValue","name":"Accuracy Rate","value":"Word Accuracy Rate (WAR) 97%"}]
{"id":1595,"datatype":"1","titleimg":"https://www.nexdata.ai/shujutang/static/image/index/datatang_yuyin_default.webp","type1":"165","type1str":null,"type2":"166","type2str":null,"dataname":"144 Hours Arabic Speech Dataset with Transcriptions for Speech Recognition","datazy":[{"title":"Format","content":"16kHz, 16 bit, wav, mono channel;"},{"title":"Content category","content":"Recorders in free conversation without a set topic;"},{"title":"Recording condition","content":"Low background noise (indoor);"},{"title":"Recording device","content":"Android smartphone, iPhone;"},{"title":"Language","content":"Arabic;"},{"title":"Features of annotation","content":"Transcription text, timestamp, speaker ID, gender."},{"title":"Accuracy Rate","content":"Word Accuracy Rate (WAR) 97%"}],"datatag":"arabic,dialogue,saudi,uae","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":null,"samplePresentation":[{"name":"0001_001_A-1.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20250702171153/0001_001_A-1.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=imdGiRsurWjnHPKBsgEHNpsJL6I%3D","intro":"هل في طريقة أزيد فيها مستوى التأمين على حسابي؟","size":139500,"progress":100,"type":"mp3"},{"name":"0001_001_A-2.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20250702171153/0001_001_A-2.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=fGpuyWAF%2BLtNZZbzQe1uUhRMcJI%3D","intro":"وإذا كنت أحتاج مستند يوضح تفاصيل التأمين لحساباتي، كيف أقدر أحصله؟","size":227916,"progress":100,"type":"mp3"},{"name":"0001_001_A-3.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20250702171153/0001_001_A-3.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=VW6uHYFZCle1r0SbW5TeuHjTYO4%3D","intro":"طيب وش الإجراءات اللي تتم في حال صار أي خلل في البنك، كيف أقدر استرجع فلوسي؟","size":266764,"progress":100,"type":"mp3"},{"name":"0001_001_A-4.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20250702171153/0001_001_A-4.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=epYMV%2FS61ySVQ4qMDoVy6g54634%3D","intro":"يعني ما يحتاج أقدم طلب وأتابع الموضوع بنفسي؟","size":138796,"progress":100,"type":"mp3"},{"name":"0001_001_A-5.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20250702171153/0001_001_A-5.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=SyabuQS0WpcIkm8n%2FEEcLbROitg%3D","intro":"تمام، بخصوص الحسابات اللي مسجل فيها أكثر من مستفيد، كيف يتم التعامل معها في التأمين؟","size":238604,"progress":100,"type":"mp3"}],"officialSummary":"This dataset contains 144 hours of Arabic conversational speech recorded through spontaneous dialogues using smartphones. It includes high-quality transcriptions, speaker IDs, gender, and additional metadata. The recordings were collected from speakers across diverse geographic regions and demographic backgrounds, helping improve model performance in real-world conversational speech applications. The dataset has been quality-validated by multiple AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.","dataexampl":null,"datakeyword":["arabic speech dataset","arabic voice dataset","arabic speech corpus","arabic audio dataset","speaker diarization dataset"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Language,Data Type","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"speechRec","dataShowType":"[{\"code\":\"0\",\"language\":\"ZH\"},{\"code\":\"1\",\"language\":\"ZH\"},{\"code\":\"2\",\"language\":\"EN,PT,DE,KO,FR,ES\"},{\"code\":\"3\",\"language\":\"EN\"},{\"code\":\"4\",\"language\":\"JP\"}]","productNameEn":"144 Hours - Arabic Spontaneous Dialogue Smartphone speech dataset","BGimg":"brightSpot_audio","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"]}
144 Hours Arabic Speech Dataset with Transcriptions for Speech Recognition
arabic speech dataset
arabic voice dataset
arabic speech corpus
arabic audio dataset
speaker diarization dataset
This dataset contains 144 hours of Arabic conversational speech recorded through spontaneous dialogues using smartphones. It includes high-quality transcriptions, speaker IDs, gender, and additional metadata. The recordings were collected from speakers across diverse geographic regions and demographic backgrounds, helping improve model performance in real-world conversational speech applications. The dataset has been quality-validated by multiple AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Format
16kHz, 16 bit, wav, mono channel;
Content category
Recorders in free conversation without a set topic;