[{"@type":"PropertyValue","name":"Format","value":"16kHz, 16bit, uncompressed wav, mono channel"},{"@type":"PropertyValue","name":"Recording environment","value":"quiet indoor environment, without echo"},{"@type":"PropertyValue","name":"Recording content (read speech)","value":"General textss"},{"@type":"PropertyValue","name":"Speaker","value":"264 native speakers in total, 50% male and 50% female."},{"@type":"PropertyValue","name":"Device","value":"Android mobile phone, iPhone"},{"@type":"PropertyValue","name":"Language","value":"Bengali"},{"@type":"PropertyValue","name":"Transcription content","value":"text"},{"@type":"PropertyValue","name":"Accuracy rate","value":"Word Accuracy Rate (WAR) 95%"},{"@type":"PropertyValue","name":"Application scenarios","value":"speech recognition, voiceprint recognition"}]
{"id":1930,"datatype":"1","titleimg":"https://www.nexdata.ai/shujutang/static/image/index/datatang_yuyin_default.webp","type1":"165","type1str":null,"type2":"166","type2str":null,"dataname":"201 Hours - Bengali Speech Dataset with Transcripts for AI Training","datazy":[{"title":"Format","content":"16kHz, 16bit, uncompressed wav, mono channel"},{"title":"Recording environment","content":"quiet indoor environment, without echo"},{"title":"Recording content (read speech)","content":"General textss"},{"title":"Speaker","content":"264 native speakers in total, 50% male and 50% female."},{"title":"Device","content":"Android mobile phone, iPhone"},{"title":"Language","content":"Bengali"},{"title":"Transcription content","content":"text"},{"title":"Accuracy rate","content":"Word Accuracy Rate (WAR) 95%"},{"title":"Application scenarios","content":"speech recognition, voiceprint recognition"}],"datatag":"india","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":null,"samplePresentation":[{"name":"G00002_S0001.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20260402154104/G00002_S0001.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=VYndjXlEhKo2YwVWbLvUx1Qmn5o%3D","intro":"এই প্রসঙ্গে সুখকে রংধনুর সঙ্গে তুলনা করা যেতে পারে","size":141772,"progress":100,"type":"mp3"},{"name":"G00002_S0002.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20260402154104/G00002_S0002.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=TbxEKbCpbr4RS%2FgU2ww97ydUrww%3D","intro":"এডি আভিলা রাইজিং ভয়েসেসএর ডিরেক্টর তিনি সম্প্রতি পশ্চিম আফ্রিকার গিনি বিসাউ ভ্রমণ করে এসেছেন","size":221484,"progress":100,"type":"mp3"},{"name":"G00002_S0003.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20260402154104/G00002_S0003.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=6%2F1DR7VE7dbRTExQTCkHwAMe03A%3D","intro":"বাচ্চারদের তোমাকে দরকার আর আমারও তোমাকে দরকার এবং স্টিফেনেরও","size":178924,"progress":100,"type":"mp3"},{"name":"G00002_S0004.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20260402154104/G00002_S0004.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=06I2GxNHIwL0kmH9ZjejvA0KsBs%3D","intro":"উদাহরণস্বরূপ আলমারির কথাই ধর সে কাপড়চোপড় চারিদিকে ছড়িয়ে ছিটিয়ে রাখত","size":184364,"progress":100,"type":"mp3"},{"name":"G00002_S0005.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20260402154104/G00002_S0005.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=WxhV9gofDr%2FsCxGODASN2us7VlY%3D","intro":"মডেল হওয়ার অনুভূতি কেমন এবং আমার মনে হয় তারা যে উত্তরটা খোঁজে তা হলো","size":170924,"progress":100,"type":"mp3"}],"officialSummary":"This Bengali (India) speech dataset contains over 200 hours recordings, collected from monologue based on given scripts. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.","dataexampl":null,"datakeyword":["bengali speech dataset","bengali ASR dataset","bengali speech corpus","bengali voice dataset","speech dataset india"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Data Type,Language","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"speechRec","dataShowType":"[{\"code\":\"0\",\"language\":\"ZH\"},{\"code\":\"1\",\"language\":\"ZH\"},{\"code\":\"2\",\"language\":\"EN\"},{\"code\":\"3\",\"language\":\"EN\"}]","productNameEn":"201 Hours - Bengali(India) Scripted Monologue Smartphone Speech Dataset","BGimg":"brightSpot_audio","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"]}
201 Hours - Bengali Speech Dataset with Transcripts for AI Training
bengali speech dataset
bengali ASR dataset
bengali speech corpus
bengali voice dataset
speech dataset india
This Bengali (India) speech dataset contains over 200 hours recordings, collected from monologue based on given scripts. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Format
16kHz, 16bit, uncompressed wav, mono channel
Recording environment
quiet indoor environment, without echo
Recording content (read speech)
General textss
Speaker
264 native speakers in total, 50% male and 50% female.
Device
Android mobile phone, iPhone
Language
Bengali
Transcription content
text
Accuracy rate
Word Accuracy Rate (WAR) 95%
Application scenarios
speech recognition, voiceprint recognition
Sample
Audio
এই প্রসঙ্গে সুখকে রংধনুর সঙ্গে তুলনা করা যেতে পারে
Audio
এডি আভিলা রাইজিং ভয়েসেসএর ডিরেক্টর তিনি সম্প্রতি পশ্চিম আফ্রিকার গিনি বিসাউ ভ্রমণ করে এসেছেন
Audio
বাচ্চারদের তোমাকে দরকার আর আমারও তোমাকে দরকার এবং স্টিফেনেরও
Audio
উদাহরণস্বরূপ আলমারির কথাই ধর সে কাপড়চোপড় চারিদিকে ছড়িয়ে ছিটিয়ে রাখত
Audio
মডেল হওয়ার অনুভূতি কেমন এবং আমার মনে হয় তারা যে উত্তরটা খোঁজে তা হলো