en

Please fill in your name

Mobile phone format error

Please enter the telephone

Please enter your company name

Please enter your company email

Please enter the data requirement

Successful submission! Thank you for your support.

Format error, Please fill in again

Confirm

The data requirement cannot be less than 5 words and cannot be pure numbers

m.nexdata.datatang.com

200,000 Multilingual Text Dataset in French, German, Spanish & Italian for NLP Training

multilingual text dataset
French text dataset
German text dataset
Spanish text dataset
Italian text data
NLP multilingual training
language model fine-tuning
categorized text dataset
LLM training data
multilingual corpus

This dataset contains 200,000 pieces of high-quality multilingual text content, evenly distributed across four languages: French, German, Spanish, and Italian (50,000 per language). The text samples span over 200 categories such as architecture, animals, automobiles, food & beverage, movies, zodiac signs, and cybersecurity. Designed to support a variety of natural language processing (NLP) tasks, this dataset is ideal for multilingual language model fine-tuning, cross-lingual classification, machine translation, and generative AI applications. All content is clean, well-formatted, and suitable for commercial and academic AI research.

Paid Datasets
This is a paid datasets for commercial use, research purpose and more. Licensed ready made datasets help jump-start AI projects.
SpecificationsSpecifications
Data content
200000 pieces of text content in French, German, Spanish, and Italian
Category
covering more than 200 categories such as architecture, animals, automobiles, catering, movies, constellations, cybersecurity, etc
Data volume
50000 pieces each for French, German, Spanish, and Italian
Languages
French, German, Spanish, Italian
Field
contents,category
Format
json
Sample Sample
  • 200,000 Multilingual Text Dataset in French, German, Spanish & Italian for NLP Training
  • 200,000 Multilingual Text Dataset in French, German, Spanish & Italian for NLP Training
  • 200,000 Multilingual Text Dataset in French, German, Spanish & Italian for NLP Training
Recommended DatasetsRecommended Dataset
Tell Us Your Special Needs

By submitting, I agree to the Privacy Protection

a7e88ef6-2cac-4748-a216-7a0d59891d09

259a9bc1-376a-4d0a-bbbb-1e0be0744aec