Train your first YOLO model·Data·Lesson 4/10

Lessonbeginner

Prepare a Custom Dataset

Convert images and labels into the Ultralytics YOLO format — folders, files, normalized coordinates.

Ultralytics YOLO expects a very specific dataset layout: images in one folder, label .txt files in a parallel folder, normalized coordinates inside each label. The full datasets overview lists every supported task and source. Once you've set up one dataset this way, you've set them all up — but data collection and annotation is its own discipline. The first time, expect to fight with paths and forget the normalization. We'll do it together.

Outcome

Convert a small set of labeled images into the Ultralytics YOLO directory and label format, ready for training.

Fast Track

If you already know your way around, here's the short version.

Two top-level folders: images/ and labels/, each with train/ and val/ inside.
Image and label filenames must match: images/train/cat_001.jpg ↔ labels/train/cat_001.txt.
Each label line: class_id x_center y_center width height, all normalized 0–1.
Coordinates are relative to image dimensions — not pixels.

Hands-on

The directory layout

Ultralytics Platform dataset upload dialog

my_dataset/
├── images/
│   ├── train/
│   │   ├── 000001.jpg
│   │   └── 000002.jpg
│   └── val/
│       └── 000003.jpg
└── labels/
    ├── train/
    │   ├── 000001.txt
    │   └── 000002.txt
    └── val/
        └── 000003.txt

Ultralytics YOLO detection datasets find labels by replacing /images/ with /labels/ in the image path and changing the extension to .txt. That's it — no manifest file, no JSON. Get the layout right and YOLO finds everything.

The label format

One line per object:

class_id  x_center  y_center  width  height

All five numbers separated by whitespace. The class is an integer index. The other four describe a bounding box normalized to [0, 1] by image dimensions:

x_center = (left + right) / 2 / image_width
y_center = (top + bottom) / 2 / image_height
width = (right - left) / image_width
height = (bottom - top) / image_height

A typical label file:

0 0.4860 0.6312 0.1800 0.4150
2 0.7950 0.5400 0.0900 0.1300

That's two objects: a class-0 (large, lower-left) and a class-2 (smaller, upper-right).

Forgotten normalization is the most common bug

A label of 0 487 312 180 415 (raw pixels) silently trains the model to predict garbage. The first epoch will look fine, val mAP will be 0, and you'll lose an afternoon. If your numbers are bigger than 1, you forgot to normalize.

Convert from common formats

Most labeling tools export COCO JSON (see the COCO dataset) or Pascal VOC XML. The COCO JSON training guide explains the common path, the JSON2YOLO repo has scripts for the common formats, and tools like Roboflow export YOLO format directly. Or roll a one-off:

import xml.etree.ElementTree as ET
from pathlib import Path

CLASSES = ["forklift", "person", "pallet"]

def voc_to_yolo(xml_path: Path, out_dir: Path):
    tree = ET.parse(xml_path)
    root = tree.getroot()
    w = int(root.find("size/width").text)
    h = int(root.find("size/height").text)
    lines = []
    for obj in root.findall("object"):
        name = obj.find("name").text
        if name not in CLASSES:
            continue
        cls_id = CLASSES.index(name)
        b = obj.find("bndbox")
        x1, y1 = float(b.find("xmin").text), float(b.find("ymin").text)
        x2, y2 = float(b.find("xmax").text), float(b.find("ymax").text)
        xc = (x1 + x2) / 2 / w
        yc = (y1 + y2) / 2 / h
        bw = (x2 - x1) / w
        bh = (y2 - y1) / h
        lines.append(f"{cls_id} {xc:.6f} {yc:.6f} {bw:.6f} {bh:.6f}")
    out_dir.joinpath(xml_path.stem + ".txt").write_text("\n".join(lines))

A starter dataset to practice on

Ultralytics ships several "tiny" datasets you can use as templates: coco8, coco128, african-wildlife, crack-seg. They're small enough to download and inspect, big enough to actually train on:

# Auto-downloads the first time
yolo predict model=yolo26n.pt data=coco8.yaml

After running, look at ~/datasets/coco8/ to see the exact directory layout — useful as a reference when you set up your own.

Sanity-check before training

Before you train, eyeball your labels — a quick pass of data cleaning here saves epochs later, and the trainer will add data augmentation on top of whatever you ship:

yolo predict model=yolo26n.pt source=my_dataset/images/train save_txt=True

Or write a quick script that draws every label box on its image and saves a 4×4 grid. Eyes catch things metrics never will — a class shifted by one, half the boxes drawn around shadows, normalization that flipped y and x.

Try It

Take 10 of your project's images and convert them into the YOLO layout. Open one label file and confirm all numbers are between 0 and 1. Open the corresponding image and confirm the boxes are sensible.

Done When

You've finished the lesson when all of these are true.

Your dataset has the correct images/, labels/, train/, val/ layout.
All label coordinates are normalized to [0, 1].
Image and label filenames match (without extensions).

Show solution

0 0.4860 0.6312 0.1800 0.4150

ls my_dataset/images/train | head -5
ls my_dataset/labels/train | head -5
head -1 my_dataset/labels/train/000001.txt    # all numbers should be < 1

What's next

Folders and files in place. Now we tell YOLO about them with a single YAML.

Get Started