Prepare a Custom Dataset
Convert images and labels into the Ultralytics YOLO format — folders, files, normalized coordinates.
Ultralytics YOLO expects a very specific dataset layout: images in one folder, label .txt files in a parallel folder, normalized coordinates inside each label. This lesson covers the mechanical conversion — folders, paths, normalization. The earlier Building High-Performance YOLO Datasets course covers the harder, upstream work; the data collection and annotation guide and preprocessing annotated data guide are the deeper docs references for that work. The full datasets overview lists every supported task and source.
Convert a small set of labeled images into the Ultralytics YOLO directory and label format, ready for training.
Two top-level folders:
images/andlabels/, each withtrain/andval/inside.Image and label filenames must match:
images/train/cat_001.jpg↔labels/train/cat_001.txt.Each label line:
class_id x_center y_center width height, all normalized 0–1.Coordinates are relative to image dimensions — not pixels.
Hands-on
Link to this sectionThe directory layout#

my_dataset/
├── images/
│ ├── train/
│ │ ├── 000001.jpg
│ │ └── 000002.jpg
│ └── val/
│ └── 000003.jpg
└── labels/
├── train/
│ ├── 000001.txt
│ └── 000002.txt
└── val/
└── 000003.txtUltralytics YOLO detection datasets find labels by replacing /images/ with /labels/ in the image path and changing the extension to .txt. That's it — no manifest file, no JSON. Get the layout right and YOLO finds everything.
Link to this sectionThe label format#
One line per object:
class_id x_center y_center width heightAll five numbers separated by whitespace. The class is an integer index. The other four describe a bounding box normalized to [0, 1] by image dimensions:
x_center= (left + right) / 2 / image_widthy_center= (top + bottom) / 2 / image_heightwidth= (right - left) / image_widthheight= (bottom - top) / image_height
A typical label file:
0 0.4860 0.6312 0.1800 0.4150
2 0.7950 0.5400 0.0900 0.1300That's two objects: a class-0 (large, lower-left) and a class-2 (smaller, upper-right).
A label of 0 487 312 180 415 (raw pixels) silently trains the model to predict garbage. The first epoch will look fine, val mAP will be 0, and you'll lose an afternoon. If your numbers are bigger than 1, you forgot to normalize.
Link to this sectionConvert from common formats#
Most labeling tools export COCO JSON (see the COCO dataset) or Pascal VOC XML. The COCO JSON training guide explains the common path, the JSON2YOLO repo has scripts for the common formats, and tools like Roboflow export YOLO format directly. Or roll a one-off:
import xml.etree.ElementTree as ET
from pathlib import Path
CLASSES = ["forklift", "person", "pallet"]
def voc_to_yolo(xml_path: Path, out_dir: Path):
tree = ET.parse(xml_path)
root = tree.getroot()
w = int(root.find("size/width").text)
h = int(root.find("size/height").text)
lines = []
for obj in root.findall("object"):
name = obj.find("name").text
if name not in CLASSES:
continue
cls_id = CLASSES.index(name)
b = obj.find("bndbox")
x1, y1 = float(b.find("xmin").text), float(b.find("ymin").text)
x2, y2 = float(b.find("xmax").text), float(b.find("ymax").text)
xc = (x1 + x2) / 2 / w
yc = (y1 + y2) / 2 / h
bw = (x2 - x1) / w
bh = (y2 - y1) / h
lines.append(f"{cls_id} {xc:.6f} {yc:.6f} {bw:.6f} {bh:.6f}")
out_dir.joinpath(xml_path.stem + ".txt").write_text("\n".join(lines))Link to this sectionA starter dataset to practice on#
Ultralytics ships several "tiny" datasets you can use as templates: coco8, coco128, african-wildlife, crack-seg. They're small enough to download and inspect, big enough to actually train on:
# Auto-downloads the first time
yolo val model=yolo26n.pt data=coco8.yamlAfter running, look at ~/datasets/coco8/ to see the exact directory layout — useful as a reference when you set up your own.
Link to this sectionSanity-check before training#
Before you train, eyeball your labels — a quick pass of data cleaning here saves epochs later, and the trainer will add data augmentation on top of whatever you ship:
yolo predict model=yolo26n.pt source=my_dataset/images/train save_txt=TrueOr write a quick script that draws every label box on its image and saves a 4×4 grid. Eyes catch things metrics never will — a class shifted by one, half the boxes drawn around shadows, normalization that flipped y and x.
Take 10 of your project's images and convert them into the YOLO layout. Open one label file and confirm all numbers are between 0 and 1. Open the corresponding image and confirm the boxes are sensible.
Your dataset has the correct
images/,labels/,train/,val/layout.All label coordinates are normalized to [0, 1].
Image and label filenames match (without extensions).
Show solution
0 0.4860 0.6312 0.1800 0.4150ls my_dataset/images/train | head -5
ls my_dataset/labels/train | head -5
head -1 my_dataset/labels/train/000001.txt # all numbers should be < 1Folders and files in place. Now we tell YOLO about them with a single YAML.