AI Training Dataset Examples
Each AI training type below shows when it is used, how the dataset ZIP is structured, and what the actual data files look like.
Text & Language Models
Business case: Classify, route, or analyze text such as emails, tickets, reviews, and chat messages.
Download Sample DatasetZIP Structure
nlp_sample.zip
├── train/data.csv
├── val/data.csv
├── test/data.csv
├── metadata.json
└── README.md
train/data.csv
text,label
"I forgot my password",support
"I want a refund",billing
"This is spam",spam
Business case: Build custom AI assistants that follow company rules, tone, and knowledge.
Download Sample Datasettrain/data.jsonl
{"messages":[{"role":"user","content":"Reset my password"},{"role":"assistant","content":"Click 'Forgot Password'."}]}
{"messages":[{"role":"user","content":"What are your hours?"},{"role":"assistant","content":"We are open 9am–5pm."}]}
Business case: Semantic search, similarity matching, clustering, and RAG systems.
Download Sample Datasetdata.jsonl
{"text":"Resetting your password"}
{"text":"Refund policy details"}
{"text":"Troubleshooting login errors"}
Business case: Generate marketing copy, emails, reports, and creative content.
Download Sample Datasetdata.jsonl
{"prompt":"Write a welcome email","completion":"Welcome to our platform!"}
{"prompt":"Describe our product","completion":"Our product helps teams work faster."}
Business case: Transcribe calls, meetings, voice notes, and commands.
Download Sample Datasettranscripts.csv
file,text
audio001.wav,"I need help with my order"
audio002.wav,"Please cancel my subscription"
Business case: Voice assistants, IVR systems, accessibility tools.
Download Sample Datasettext.csv
text,file
"Welcome to our service",welcome.wav
"Your order has shipped",shipping.wav
Image & Vision Models
Business case: Visual inspection, monitoring, and image analysis.
Download Sample DatasetZIP Structure
images/
├── img001.jpg
├── img002.jpg
Business case: Categorize images (products, documents, defects).
Download Sample DatasetFolder Layout
train/
├── cat/
└── dog/
Business case: Pass/fail or yes/no visual decisions.
Download Sample DatasetFolder Layout
train/
├── yes/
└── no/
Business case: Locate and count objects like people, vehicles, or products.
Download Sample Datasetannotations.json
{"image_id":1,"bbox":[120,200,60,90],"category":"person"}
Business case: Pixel-level precision for medical or industrial use.
Download Sample DatasetImages & Masks
images/img001.png
masks/img001_mask.png
Business case: Creative assets, marketing visuals, concept art.
Download Sample Datasetprompts.jsonl
{"prompt":"A futuristic city at sunset"}
{"prompt":"A cat wearing sunglasses"}
Business case: Marketing videos, simulations, training content.
Download Sample Datasetprompts.jsonl
{"prompt":"A cinematic sunset over mountains"}
Business case: Visual search, image Q&A, multimodal understanding.
Download Sample Datasetdata.jsonl
{"image":"img001.jpg","question":"What animal is this?","answer":"dog"}
Audio, Data & Decision Models
Business case: Detect alarms, machine faults, or sound events.
Download Sample Datasetlabels.csv
file,label
alarm.wav,alarm
engine.wav,engine_noise
Business case: Combine text, image, and audio for smarter decisions.
Download Sample Datasetdata.jsonl
{"image":"img.jpg","audio":"sound.wav","text":"Dog barking"}
Business case: Forecast demand, sales, or traffic trends.
Download Sample Datasetdata.csv
timestamp,value
2024-01-01,120
2024-01-02,135
Business case: Product and content personalization.
Download Sample Datasetinteractions.csv
user_id,item_id,rating
1,product_9,5
2,product_3,3
Business case: Fraud detection and system monitoring.
Download Sample Datasetdata.csv
value,is_anomaly
120,0
999,1
Business case: Optimize decisions via trial-and-error (robots, pricing, games).
Download Sample Datasetepisodes.jsonl
{"state":1,"action":0,"reward":1,"next_state":2}
{"state":2,"action":1,"reward":-1,"next_state":3}
Business case: Relationship-driven problems like fraud rings or networks.
Download Sample Datasetedges.csv
source,target
user_1,user_2
user_2,user_3