Write a Dataset YAML
The single config file that connects your folders to YOLO's training loop.
Ultralytics YOLO finds the rest of your detection dataset through one file: data.yaml. It points at the training and validation folders and lists your classes. It is short — usually under 20 lines — and it is the only thing standing between your prepared data and model.train().
Author a data.yaml that points Ultralytics YOLO at your dataset and lists your classes.
path:— root of the dataset.train:andval:— relative paths to image folders.names:— dict of class index → name (must match label file class indices).Save anywhere; pass the path to
model.train(data=…).
Hands-on
The minimal YAML

# Dataset root (absolute or relative to where you run training)
path: /home/me/datasets/my_dataset
# Image folders (relative to path)
train: images/train
val: images/val
# test: images/test # optional
# Classes (index → name)
names:
0: forklift
1: person
2: palletThat's enough for YOLO to:
- Locate your images.
- Find the matching labels by replacing
images/withlabels/. - Decode the integer at the start of each label line into a class name.
The coco8 example is a great minimal reference if you want a working YAML to copy from, and the VOC dataset YAML shows how a larger one is structured.
Class indices and names
The integer at the start of a label line is an index into names. So in the example above, a label starting with 0 means a forklift. Mismatches between label indices and YAML names cause silent training failures — the model will train, val mAP will look fine, but the names on the output will be wrong. (Bigger projects often start from the full COCO dataset or browse the datasets overview for a closer-fit starter.)
A useful hygiene practice:
- Treat the YAML as the source of truth for class indices.
- Use the same indices everywhere — labeling tool, conversion script, deployment code.
- Never reorder
namesafter you've labeled data. Add new classes only at the end.
If you swap names 0 and 1 in the YAML without relabeling, every label in your dataset is now wrong. The model will train but predict the wrong classes. There's no warning. Always add new classes at the end and leave old ones alone.
Multiple validation sets
If you want to validate on more than one set — you might also want a held-out test set — pass a list:
val:
- images/val_easy
- images/val_hardCommon pattern: a curated "regression" val set you control, plus a freshly sampled "production-realistic" one. The data collection and annotation guide has more advice on splitting data and avoiding class imbalance.
Pointing at a remote / shared dataset
path: accepts absolute paths and URLs:
path: https://ultralytics.com/assets/coco8.zip # auto-downloads & extractsInternally Ultralytics fetches and caches the zip. Useful when several people work off the same starter dataset. Every other YAML key — augmentation, sampler, cache mode — is documented in the configuration reference.
Verify the YAML
The fastest way to verify is to run validation on a pretrained model — even though it has no idea about your classes, it confirms the layout is readable:
yolo val model=yolo26n.pt data=my_dataset/data.yamlIf you see No labels found or No images found, the YAML or layout is wrong — fix it before going to training.
Write a data.yaml for your dataset and run yolo val model=yolo26n.pt data=my_dataset/data.yaml. The mAP will be low (we're using a model that doesn't know your classes), but the output should at least list your classes — that proves the YAML is connected.
Your
data.yamllistspath,train,val, andnames.yolo val model=yolo26n.pt data=…finds your images and labels.Class indices in your label files match the YAML's
names.
Show solution
path: ./datasets/my_dataset
train: images/train
val: images/val
names:
0: forklift
1: person
2: palletDataset prepared, YAML in hand — let's actually train.