en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

High-Quality Training Datasets

Boost the performance of your AI models with our high-quality, ready-to-use training datasets.

Language

All

Data Type

All

203 Hours - Korean(Korea) Medical Entities Real-world Casual Conversation and Monologue speech dataset

Korean(Korea) Medical Entities Real-world Casual Conversation and Monologue speech dataset, covering various medical professional terminologies, primarily focuses on medical consultation, medical education, medical academic conferences and lectures, etc., mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, common entities and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Korean Entity Spontaneous Dialogue Medical

215 Hours - Korean(Korea) Financial Entities Real-world Casual Conversation and Monologue speech dataset

Korean(Korea) Financial Entities Real-world Casual Conversation and Monologue speech dataset, covering various financial professional terminologies, primarily focuses on macroeconomics and microeconomics, mirrors real-world interactions. Transcribed with text content, speaker's ID, gender, common entities and other attributes. Our dataset was collected from extensive and diversify speakers, geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Korean Entity Spontaneous Dialogue Financial

5,000 People Multi-race – Infrared Face Recognition Data

5,000 people multi-race – infrared face recognition data. The collecting scenes of this dataset include indoor scenes and outdoor scenes. The data includes male and female. The race distribution includes Asian, Black, Caucasian and Brown people. The age distribution ranges from child to the elderly, the young people and the middle aged are the majorities. The collecting device is DV-DH4,044S305AD. The data diversity includes multiple age periods, multiple facial postures, multiple scenes. The data can be used for tasks such as infrared face recognition. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Multi-race infrared face binocular camera multiple age periods multiple facial postures multiple scenes

Millions of Foreign Face Data_Single Image

Millions of Foreign Face Data. One person have one frontal face image. The race distribution includes Asian, Black, Caucasian and brown people, the age distribution is ranging from infant to the elderly, the middle-aged and young people are the majorities. The collection environment includes indoor and outdoor scenes. The data diversity includes multiple age periods, multiple scenes, multiple facial postures and multiple expressions. The data can be used for tasks such as face recognition. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Million id foreign face single image

4 People - Northeastern dialect Average Tone Speech Synthesis Corpus

4 People - Northeastern dialect Average Tone Speech Synthesis Corpus. It is recorded by Northeast native. About 40% of the corpus contains words unique to Northeast China, the phonemes and tones are balanced. Professional phonetician participates in the annotation. It precisely matches with the research and development needs of the speech synthesis.
Synthesis Corpus TTS Average Tone General Northeast Dialect

373 Hours - Dari(Afghanistan) Spontaneous Dialogue Smartphone speech dataset

Dari(Afghanistan) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(504 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
asr audio conversational asr data dari

100,000 Instruction-Following Evaluation SFT for Chinese LLM Text Data

100,000 Instruction-Following Evaluation SFT for Chinese LLM Text Data. Between 50 and 400 words, with no fewer than 3 constraints in each prompt.All prompt are manually written to satisfy the diversity of coverage.
LLM Instruction-Following SFT

103 Hours - Indonesian(Indonesia) Spontaneous Dialogue Smartphone speech dataset

Indonesian(Indonesia) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(168 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
audio data dataset conversational asr data Indonesian

310 Hours - Turkish Scripted Monologue Smartphone Speech Dataset

Turkish Scripted Monologue Smartphone Speech Dataset, collected from monologue based on given scripts. Transcribed with text content. Our dataset was collected from extensive and diversify speakers(223 people in total, from turkey), geographicly speaking, enhancing model performance in real and complex tasks.rnQuality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Turkey Turkish phone speech

183 Hours - Urdu Spontaneous Dialogue Smartphone speech dataset

Urdu Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(240 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
urdu phone dialogue

175 Hours - Thai(Thailand) Spontaneous Dialogue Smartphone speech dataset

Thai(Thailand) Spontaneous Dialogue Smartphone speech dataset, collected from dialogues based on given topics, covering 20+ domains. Transcribed with text content, speaker's ID, gender, age and other attributes. Our dataset was collected from extensive and diversify speakers(322 native speakers), geographicly speaking, enhancing model performance in real and complex tasks. Quality tested by various AI companies. We strictly adhere to data protection regulations and privacy standards, ensuring the maintenance of user privacy and legal rights throughout the data collection, storage, and usage processes, our datasets are all GDPR, CCPA, PIPL complied.
Thai dialogue smartphone

2 Million Pairs Image Caption Data Of General Scenes

2 million pairs of images and descriptions, the pictures cover various categories, including landscapes, animals, flowers and trees, people, cars, sports, industry, and architecture, along with an aesthetic subset. They depict the overall scene of the image, the details within the scene, and the emotions conveyed by the image. The description is provided in both English and Chinese languages.
Text description multi-modality general scene data set English caption Chinese caption
. . .

loading

5533eceb-dbb5-4801-b263-fcb0432e926e