Skip to main content
Train your first YOLO model·Data·Lesson 5/10
Lessonbeginner

Write a Dataset YAML

The single config file that connects your folders to YOLO's training loop.

Ultralytics YOLO finds the rest of your detection dataset through one file: data.yaml. It points at the training and validation folders and lists your classes. It is short — usually under 20 lines — and it is the only thing standing between your prepared data and model.train().

Outcome

Author a data.yaml that points Ultralytics YOLO at your dataset and lists your classes.

Fast Track
If you already know your way around, here's the short version.
  1. path: — root of the dataset.

  2. train: and val: — relative paths to image folders.

  3. names: — dict of class index → name (must match label file class indices).

  4. Save anywhere; pass the path to model.train(data=…).

Hands-on

The minimal YAML

Ultralytics Platform data overview with datasets

# Dataset root (absolute or relative to where you run training)
path: /home/me/datasets/my_dataset

# Image folders (relative to path)
train: images/train
val: images/val
# test: images/test    # optional

# Classes (index → name)
names:
  0: forklift
  1: person
  2: pallet

That's enough for YOLO to:

  1. Locate your images.
  2. Find the matching labels by replacing images/ with labels/.
  3. Decode the integer at the start of each label line into a class name.

The coco8 example is a great minimal reference if you want a working YAML to copy from, and the VOC dataset YAML shows how a larger one is structured.

Class indices and names

The integer at the start of a label line is an index into names. So in the example above, a label starting with 0 means a forklift. Mismatches between label indices and YAML names cause silent training failures — the model will train, val mAP will look fine, but the names on the output will be wrong. (Bigger projects often start from the full COCO dataset or browse the datasets overview for a closer-fit starter.)

A useful hygiene practice:

  • Treat the YAML as the source of truth for class indices.
  • Use the same indices everywhere — labeling tool, conversion script, deployment code.
  • Never reorder names after you've labeled data. Add new classes only at the end.
Reordering names is destructive

If you swap names 0 and 1 in the YAML without relabeling, every label in your dataset is now wrong. The model will train but predict the wrong classes. There's no warning. Always add new classes at the end and leave old ones alone.

Multiple validation sets

If you want to validate on more than one set — you might also want a held-out test set — pass a list:

val:
  - images/val_easy
  - images/val_hard

Common pattern: a curated "regression" val set you control, plus a freshly sampled "production-realistic" one. The data collection and annotation guide has more advice on splitting data and avoiding class imbalance.

Pointing at a remote / shared dataset

path: accepts absolute paths and URLs:

path: https://ultralytics.com/assets/coco8.zip   # auto-downloads & extracts

Internally Ultralytics fetches and caches the zip. Useful when several people work off the same starter dataset. Every other YAML key — augmentation, sampler, cache mode — is documented in the configuration reference.

Verify the YAML

The fastest way to verify is to run validation on a pretrained model — even though it has no idea about your classes, it confirms the layout is readable:

yolo val model=yolo26n.pt data=my_dataset/data.yaml

If you see No labels found or No images found, the YAML or layout is wrong — fix it before going to training.

Try It

Write a data.yaml for your dataset and run yolo val model=yolo26n.pt data=my_dataset/data.yaml. The mAP will be low (we're using a model that doesn't know your classes), but the output should at least list your classes — that proves the YAML is connected.

Done When
You've finished the lesson when all of these are true.
  • Your data.yaml lists path, train, val, and names.

  • yolo val model=yolo26n.pt data=… finds your images and labels.

  • Class indices in your label files match the YAML's names.

Show solution
path: ./datasets/my_dataset

train: images/train
val:   images/val

names:
  0: forklift
  1: person
  2: pallet
What's next

Dataset prepared, YAML in hand — let's actually train.