[{"@type":"PropertyValue","name":"Format","value":"16kHz, 16 bit, wav, mono channel"},{"@type":"PropertyValue","name":"Content category","value":"including interview, self-meida,variety show, etc."},{"@type":"PropertyValue","name":"Recording environment","value":"Low background noise"},{"@type":"PropertyValue","name":"Country","value":"Canada(CAN)"},{"@type":"PropertyValue","name":"Language(Region) Code","value":"fr-CA"},{"@type":"PropertyValue","name":"Language","value":"French"},{"@type":"PropertyValue","name":"Features of annotation","value":"Transcription text, timestamp, speaker ID, gender, noise"},{"@type":"PropertyValue","name":"Accuracy","value":"Word Accuracy Rate (WAR) 98%(Tags, gender, speakerID, accent, topic are not included in accuracy statistics due to subjectivity)"}]
{"id":1705,"datatype":"1","titleimg":"https://www.nexdata.ai/shujutang/static/image/index/datatang_yuyin_default.webp","type1":"165","type1str":null,"type2":"166","type2str":null,"dataname":"1464 Hours Large-Scale Canadian French Speech Dataset for AI Training","datazy":[{"title":"Format","content":"16kHz, 16 bit, wav, mono channel","desc":"Format"},{"title":"Content category","content":"including interview, self-meida,variety show, etc.","desc":"Content category"},{"title":"Recording environment","content":"Low background noise","desc":"Recording environment"},{"title":"Country","content":"Canada(CAN)","desc":"Country"},{"title":"Language(Region) Code","content":"fr-CA","desc":"Language(Region) Code"},{"title":"Language","content":"French","desc":"Language"},{"title":"Features of annotation","content":"Transcription text, timestamp, speaker ID, gender, noise","desc":"Features of annotation"},{"title":"Accuracy","content":"Word Accuracy Rate (WAR) 98%(Tags, gender, speakerID, accent, topic are not included in accuracy statistics due to subjectivity)","desc":"Accuracy"}],"datatag":"Canada,French,Casual Conversation,ASR","technologydoc":null,"downurl":null,"datainfo":null,"standard":null,"dataylurl":null,"flag":null,"publishtime":null,"createby":null,"createtime":null,"ext1":null,"samplestoreloc":null,"hosturl":null,"datasize":null,"industryPlan":null,"keyInformation":"","samplePresentation":[{"name":"000087_1.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20250722160934/000087_1.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=s8S0BuLoQpJhhWfy6qdZhxrgr7o%3D","intro":"Nous sommes le trois août deux mille onze dans la ville de Victoria, en Colombie-Britannique.","size":193324,"progress":100,"type":"mp3"},{"name":"000087_5.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20250722160934/000087_5.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=wk1mBo4mrrcpABD98iUB3pXMRuY%3D","intro":"Il y a deux hommes, deux jeunes hommes sur le plancher, couchés.","size":117164,"progress":100,"type":"mp3"},{"name":"000052_15.wav","url":"https://storage-product.datatang.com/damp/product/sample_presentation/20250722160934/000052_15.wav?Expires=4102415999&OSSAccessKeyId=LTAI5tEBeSWUJiqjXvBMsxEu&Signature=s2YHi9%2FHgMtTR6AQzAqmMoasgd8%3D","intro":"On faisait référence beaucoup à la série Selling Sunset. [N]","size":100790,"progress":100,"type":"mp3"}],"officialSummary":"This dataset contains 1,464 hours of Canadian French conversational and monologue speech collected from authentic real-world scenarios, including user-generated content, daily conversations, variety shows, and other general domains. It includes transcriptions, speaker IDs, gender, and additional metadata. Our dataset was collected from speakers with diverse geographical and background profiles, thereby enhancing the model's performance in real-world, complex tasks. The dataset has undergone quality validation by multiple AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.","dataexampl":null,"datakeyword":["canadian french speech dataset","canadian french asr dataset","french dialogue dataset","french speech dataset","french canadian speech dataset"],"isDelete":null,"ids":null,"idsList":null,"datasetCode":null,"productStatus":null,"tagTypeEn":"Data Type,Language","tagTypeZh":null,"website":null,"samplePresentationList":null,"datazyList":null,"keyInformationList":null,"dataexamplList":null,"bgimg":null,"datazyScriptList":null,"datakeywordListString":null,"sourceShowPage":"speechRec","dataShowType":"[{\"code\":\"0\",\"language\":\"ZH\"},{\"code\":\"1\",\"language\":\"ZH\"},{\"code\":\"2\",\"language\":\"EN,PT,DE,KO,FR,ES\"},{\"code\":\"3\",\"language\":\"EN\"},{\"code\":\"4\",\"language\":\"JP\"}]","productNameEn":"1464 Hours - French(Canada) Real-world Casual Conversation and Monologue speech dataset","BGimg":"brightSpot_audio","voiceBg":["/shujutang/static/image/comm/audio_bg.webp","/shujutang/static/image/comm/audio_bg2.webp","/shujutang/static/image/comm/audio_bg3.webp","/shujutang/static/image/comm/audio_bg4.webp","/shujutang/static/image/comm/audio_bg5.webp"]}
1464 Hours Large-Scale Canadian French Speech Dataset for AI Training
canadian french speech dataset
canadian french asr dataset
french dialogue dataset
french speech dataset
french canadian speech dataset
This dataset contains 1,464 hours of Canadian French conversational and monologue speech collected from authentic real-world scenarios, including user-generated content, daily conversations, variety shows, and other general domains. It includes transcriptions, speaker IDs, gender, and additional metadata. Our dataset was collected from speakers with diverse geographical and background profiles, thereby enhancing the model's performance in real-world, complex tasks. The dataset has undergone quality validation by multiple AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
Specifications
Format
16kHz, 16 bit, wav, mono channel
Content category
including interview, self-meida,variety show, etc.