en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

Home > All Category Datasets > Speech Recognition Datasets > Korean Financial Speech Dataset – 215 Hours of Real-World Audio

Korean Financial Speech Dataset – 215 Hours of Real-World Audio

Korean financial speech dataset

Korean ASR dataset

economics audio corpus

financial audio dataset

Korean business voice data

macroeconomic speech dataset

finance chatbot training data

domain-specific speech dataset

Korean language audio for AI

This Korean Financial Speech Dataset contains 215 hours of real-world audio, including casual conversations and monologues. The content spans professional financial terminology in macroeconomics and microeconomics contexts, simulating authentic banking and financial service interactions. Each recording includes transcriptions, speaker metadata (ID, gender), and tagged financial entities. The dataset supports a wide range of AI applications such as automatic speech recognition (ASR), financial natural language understanding (NLU), voicebot development, and domain-specific language modeling. All data complies with GDPR, CCPA, and PIPL regulations, ensuring privacy and ethical usage.

This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.

Specifications

Specifications

Format

16k Hz, 16 bit, wav, mono channel

Content category

Covering various financial professional terminologies, primarily focuses on macroeconomics(market trends, financial policies, etc.), microeconomics(individual enterprises, stocks, investment portfolios, etc.)

Recording condition

Low background noise

Country

Korea(KOR)

Language(Region) Code

ko-KR

Language

Korean

Features of annotation

transcription text, timestamp, speaker identification, gender, noise, PII redacted, entities, letter case

Accuracy

Word Accuracy Rate (WAR) at least 98%

Sample

Sample

Audio
지난 주에 또 시끄러웠던 게 오염수가 아니라 이제 또 홍범도 장군
Audio
얘기가 진짜 많이 [OVERLAP/]나왔었는데[/OVERLAP]
Audio
예 어쨌든 간에 또 일요일이 돌아왔고
Audio
반갑습니다.
Audio
얘기가 나오다 보니까

Recommended Datasets

Recommended Dataset

283 Hours - Indonesian(Indonesia) Spontaneous Dialogue Telephony speech dataset

Indonesian(Indonesia) Spontaneous Dialogue Telephony speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(376 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Conversational Speech Telephony Indonesian

163 Hours Russian Children Speech Dataset – Real-World Speech Data for AI Training

163 hours of Russian children’s speech dataset featuring real-world conversational and monologue recordings. The dataset captures natural speech from children aged 12 and under, reflecting authentic communication patterns in real-world scenarios. All audio samples are transcribed and include rich metadata such as speaker ID, gender, age, and accent information. The dataset is collected from diverse speakers across multiple geographic regions, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

russian children speech dataset kids speech dataset russian child speech recognition dataset pediatric speech dataset russian ASR dataset children

162 Hours - French(France) Children Real-world Casual Conversation and Monologue speech dataset

French(France) Children Real-world Casual Conversation and Monologue speech dataset, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

French Spontaneous Speech Child

346 Hours - Spanish(Mexico) Spontaneous Dialogue Smartphone speech dataset

Spanish(Mexico) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(338 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Spanish Mexican Conversation Phone

80 Hours - French(Canada) Spontaneous Dialogue Smartphone speech dataset

French(Canada) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(126 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

French Conversational Phone Canada

406 Hours - Portuguese(European) Spontaneous Dialogue Smartphone Speech Dataset

Portuguese(Portugal) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(590 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Portuguese European Mobile Phone

101 Hours - Italian(Italy) Children Real-world Casual Conversation and Monologue speech dataset

Italian(Italy) Children Real-world Casual Conversation and Monologue speech dataset, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

Italian Casual Conversation Monologue Asr

97 Hours – German Children Speech Dataset (Conversations & Monologues)

The 97-hour German Children Speech Dataset. Transcribed with text content, speaker's ID, gender, age, accent and other attributes. Our dataset was collected from extensive and diversify speakers(12 years old and younger children), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.

German children speech dataset German kids speech recognition German child speech corpus German ASR dataset children German kids voice dataset German conversational speech children German child dialogue dataset German children NLP dataset German child language dataset multilingual children speech data

Tell Us Your Special Needs

Current Project Maturity

Early exploration (no concrete specs yet)

Defined goals, need professional guidance

Active development or optimization phase

Data & labeling experts with clear specifications

Full Name *

Contact Phone No.*

Company name *

Company Email *

Data Requirements *

By submitting, I agree to the Privacy Protection

Subscribe to our newsletter

Be the first to receive Nexdata latest product releases, data solutions and enterprise news.

Off-the-Shelf Datasets: All Category Datasets; Embodied AI Datasets; LLM Datasets; Computer Vision Datasets; Speech Recognition Datasets; Speech Synthesis Datasets; OCR Datasets; Pronunciation Dictionary; NLU Datasets

Data Service: 3D Point Cloud Data; Street View Data; OCR Data; Behavior Recognition Data; Identity Recognition Data; Speech Recognition Data; Speech Synthesis Data; Multimodal Data

Industries: Embodied AI; Generative AI; Autonomous Vehicles; AR/VR; Conversational AI; Smart Home; Retail; Intelligent Healthcare

Company: About Us; News; Partners; Quality & Security; Event
Links: OPENMPD; DataPlus; Datarade

Platform: Platform
Competition: Competition
Resources: Sponsored Datasets

Sharpen Your AI with Better Data

+1(626)594-5598

[email protected]

nexdata_ai facebook

nexdata_ai twitter

nexdata_ai linkedin

nexdata_ai youtube

Copyright © 2023 NEXDATA TECHNOLOGY INC

Sitemap Terms and Conditions

We use cookies to enhance your browsing experience, serve personalized ads or content, and analyze our traffic. By clicking "Accept All", you consent to our use of cookies.

b2f79a8d-608a-4c86-a3f5-90187ea30b60

25497670-2ebd-42c3-abb8-20b53efb786e