en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

LLM Datasets

Instantly enhance AI model performance with high quality off-the-shelf datasets.

Type

All
16
Image Caption
10
SFT Datasets
1
Pre-training Text
5

21,998Image Caption Data of Vehicles

21998 Image Caption Data Of Vehicles covers various types of cars, SUVs, MPVs, trucks, and buses. Surveillance cameras are used to collect outdoor roads for multiple periods of time, mainly describing the types of vehicles. Information such as color, vehicle orientation, scene, etc., the description language is English.
multi-modality vehicle attribute data security data intelligent monitoring data intelligent traffic data smart city data

1 Million Pairs Image Caption Data Of General Scenes

1 million pairs of images and descriptions, the pictures cover various categories, including landscapes, animals, flowers and trees, people, cars, sports, industry, and architecture, along with an aesthetic subset. They depict the overall scene of the image, the details within the scene, and the emotions conveyed by the image. The description is provided in both English and Chinese languages.
Text description multi-modality general scene data set English caption Chinese caption

10,000 Image Caption Data of Diverse Scenes

10,000 Image caption data of diverse scenes including natural scenes, urban street scenes, exhibitions, family environments and other scenes, shot with different brands of cameras, including multiple time periods, multiple shooting angles, description language is English, mainly describes the main scenes in the image, usually including foreground and background description.
multi-modality natural scene data set scene information data

10,100 Image Caption Data of Human Face

10,100 Image caption data of human face includes multiple races under the age of 18, 18~45 years old, 46~60 years old, and over 60 years old; the collection scene is rich, including indoor scenes and outdoor scenes; the image content is rich, including wearing masks, glasses, wearing headphones, facial expressions, gestures, and adversarial examples. The language of the text description is English, which mainly describes the race, gender, age, shooting angle, lighting and diversity content, etc.
multi-modal multi-pose face image data face dataset

11,000 Image & Video Caption Data of Human Action

11,000 Image & Video caption data of human action contains 10,000 images and 10,000videos of various human behaviors in different seasons and different shooting angles, including indoor scenes and outdoor scenes. The description language is English, mainly describing the gender, age, clothing, behavior description and body movements of the characters.
AIGC human behavior data behavior recognition data human behavior recognition data human detection data

90,000 sets – Multi-domain Customer Service Dialogue Text Data

Multi-domain Customer Service Dialogue Text Data, 90,000 sets in total; spanning multiple domains, including telecommunications, e-commerce, and financial, lifestyle, business, education, healthcare, and entertainment; Each set of data consists of single or multi-turn conversations; this dataset can be used for tasks such as LLM training, chatgpt
Customer Service Dialogue text data telecommunications topics data commerce topics data finance topics data LLM data Large Language Model data chatgpt data

300 million pairs of high-quality image-caption dataset

300 million images, each corresponding to a description. All are genuine image works published by photographers. The vast majority of descriptions are in English, with very few in Chinese.
multimodal image description

7 Million Sets - High-Quality Video Caption Dataset

7 million global genuine high-quality videos. All are genuine video works released by photographers around the world. 6 million of them are described in English and 1 million in Chinese. They cover a variety of categories such as people, landscapes, animals, etc. The resolution is above 1080p.
multimodal video description caption LLM dataset

10 million - English Test Questions Text Parsing And Processing Data

10 Million - English Test Questions Text Parsing And Processing Data, Each question contains title, answer, parse, subject, grade, question type; The educational stages cover primary, middle, high school, and university; Subjects cover mathmatics, biology, accounting, etc.The data are questions text under the Anglo-American system, which can be used to enhance the subject knowledge of large models
English test questions text data LLM Large Language Model Large Model chatgpt data

loading

Tailor Your Data Now

Why off-the-shelf Datasets

  • Copyright

    Copyright

    Clear Coyright and Ready to Check
  • Security

    Security

    Properly Authorized Secure to Use
  • Professional

    Professional

    Designed and produced by AI data experts
  • Diversity

    Diversity

    Collected from a varity of real scenes
  • Cost Effective

    Cost Effective

    More Cost-Efficient Than Tailored Data
  • Efficiency

    Efficiency

    Ready-To-Go Deliver in Seconds
2f3fd12e-724f-4c68-bd3a-bbb5bb2b9261