[{"@type":"PropertyValue","name":"Format","value":"16kHz, 16bit, uncompressed wav, mono channel"},{"@type":"PropertyValue","name":"Recording environment","value":"quiet indoor environment, without echo"},{"@type":"PropertyValue","name":"Recording content (read speech)","value":"General textss"},{"@type":"PropertyValue","name":"Speaker","value":"139 native speakers in total, 24% male and 76% female."},{"@type":"PropertyValue","name":"Device","value":"Android mobile phone, iPhone"},{"@type":"PropertyValue","name":"Language","value":"Punjabi"},{"@type":"PropertyValue","name":"Transcription content","value":"text"},{"@type":"PropertyValue","name":"Accuracy rate","value":"Word Accuracy Rate (WAR) 95%"},{"@type":"PropertyValue","name":"Application scenarios","value":"speech recognition, voiceprint recognition"}]
{"id":1935,"datatype":"1","titleimg":"https://www.nexdata.ai/shujutang/static/image/index/datatang_yuyin_default.webp","type1":"165","type1str":null,"type2":"166","type2str":null,"dataname":"100 Hours - Punjabi Speech Dataset with Transcripts for AI Training","datazy":[{"title":"Format","content":"16kHz, 16bit, uncompressed wav, mono channel"},{"title":"Recording environment","content":"quiet indoor environment, without echo"},{"title":"Recording content (read speech)","content":"General textss"},{"title":"Speaker","content":"139 native speakers in total, 24% male and 76% female."},{"title":"Device","content":"Android mobile phone, iPhone"},{"title":"Language","content":"Punjabi"},{"title":"Transcription content","content":"text"},{"title":"Accuracy rate","content":"Word Accuracy Rate (WAR) 95%"},{"title":"Application scenarios","content":"speech recognition, voiceprint recognition"}],"datatag":"india,asr,read","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":null,"samplePresentation":[{"name":"G00012_S0001.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20260402154104/G00012_S0001.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=8VD9rNiYPJrgm%2BYoR%2FFAtwLWbLU%3D","intro":"ਨਿਮਰ ਹੋਣ ਕਰਕੇ ਅਸੀਂ ਯਹੋਵਾਹ ਤੇ ਆਪਣੀ ਨਿਹਚਾ ਬਣਾਈ ਰੱਖ ਸਕਦੇ ਹਾਂ","size":201644,"progress":100,"type":"mp3"},{"name":"G00012_S0002.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20260402154104/G00012_S0002.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=jSAcX1DjoGI5A%2B7lHjp3nqv1UHM%3D","intro":"ਯਿਸੂ ਨੇ ਉਨ੍ਹਾਂ ਨੂੰ ਭੇਜਣ ਤੋਂ ਪਹਿਲਾਂ ਬਹੁਤ ਸਾਰੀਆਂ ਹਿਦਾਇਤਾਂ ਦਿੱਤੀਆਂ","size":215404,"progress":100,"type":"mp3"},{"name":"G00012_S0003.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20260402154104/G00012_S0003.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=5SbQAmZpEOD4ACDzVRWF1df4OH0%3D","intro":"ਆਓ ਆਪਾਂ ਸਾਰੇ ਜਣੇ ਹਮੇਸ਼ਾ ਯਹੋਵਾਹ ਦੀ ਮਹਿਮਾ ਝਲਕਾਉਣ ਦਾ ਦ੍ਰਿੜ੍ਹ ਇਰਾਦਾ ਕਰਾਈਏ","size":201324,"progress":100,"type":"mp3"},{"name":"G00096_S0001.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20260402154104/G00096_S0001.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=EpaUgKlMv0b%2FuMTjVh4SANj33eE%3D","intro":"ਫਿਰ ਮੈਂ ਗਹਿਰਾ ਦਰਦ ਮਹਿਸੂਸ ਕੀਤਾ ਜੋ ਸਮੇਂ ਦੇ ਬੀਤਣ ਨਾਲ ਵਧਦਾ ਗਿਆ","size":205196,"progress":100,"type":"mp3"},{"name":"G00096_S0002.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20260402154104/G00096_S0002.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=Yk88UfZU7P0%2Bh2KJ66w86Rt7DWs%3D","intro":"ਕਈ ਵਾਰ ਇਨ੍ਹਾਂ ਮਤਭੇਦਾਂ ਕਰਕੇ ਉਨ੍ਹਾਂ ਵਿੱਚ ਟੰਟੇ ਹੁੰਦੇ ਸਨ ਜਿਸ ਕਰਕੇ ਕਲੀਸਿਯਾ ਦੀ ਸ਼ਾਂਤੀ ਭੰਗ ਹੋ ਗਈ ਸੀ","size":338892,"progress":100,"type":"mp3"}],"officialSummary":"This Punjabi Speech Dataset contains 100 hours high quality recordings, collected from monologue based on given scripts. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.","dataexampl":null,"datakeyword":["Punjabi Speech Dataset","Punjabi Audio Dataset","Punjabi Voice Dataset","Punjabi ASR Dataset"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Data Type,Language","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"speechRec","dataShowType":"[{\"code\":\"0\",\"language\":\"ZH\"},{\"code\":\"1\",\"language\":\"ZH\"},{\"code\":\"2\",\"language\":\"EN\"},{\"code\":\"3\",\"language\":\"EN\"}]","productNameEn":"100 Hours - Punjabi(India) Scripted Monologue Smartphone Speech Dataset","BGimg":"brightSpot_audio","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"]}
100 Hours - Punjabi Speech Dataset with Transcripts for AI Training
Punjabi Speech Dataset
Punjabi Audio Dataset
Punjabi Voice Dataset
Punjabi ASR Dataset
This Punjabi Speech Dataset contains 100 hours high quality recordings, collected from monologue based on given scripts. Transcribed with text content and other attributes. Our dataset was collected from extensive and diversify speakers geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Format
16kHz, 16bit, uncompressed wav, mono channel
Recording environment
quiet indoor environment, without echo
Recording content (read speech)
General textss
Speaker
139 native speakers in total, 24% male and 76% female.