Prepare a Custom Dataset
Convert images and labels into the Ultralytics YOLO format — folders, files, normalized coordinates.
Ultralytics YOLO expects a very specific dataset layout: images in one folder, label .txt files in a parallel folder, normalized coordinates inside each label. The full datasets overview lists every supported task and source. Once you've set up one dataset this way, you've set them all up — but data collection and annotation is its own discipline. The first time, expect to fight with paths and forget the normalization. We'll do it together.
Convert a small set of labeled images into the Ultralytics YOLO directory and label format, ready for training.
Two top-level folders:
images/andlabels/, each withtrain/andval/inside.Image and label filenames must match:
images/train/cat_001.jpg↔labels/train/cat_001.txt.Each label line:
class_id x_center y_center width height, all normalized 0–1.Coordinates are relative to image dimensions — not pixels.
Hands-on
The directory layout

my_dataset/
├── images/
│ ├── train/
│ │ ├── 000001.jpg
│ │ └── 000002.jpg
│ └── val/
│ └── 000003.jpg
└── labels/
├── train/
│ ├── 000001.txt
│ └── 000002.txt
└── val/
└── 000003.txtUltralytics YOLO detection datasets find labels by replacing /images/ with /labels/ in the image path and changing the extension to .txt. That's it — no manifest file, no JSON. Get the layout right and YOLO finds everything.
The label format
One line per object:
class_id x_center y_center width heightAll five numbers separated by whitespace. The class is an integer index. The other four describe a bounding box normalized to [0, 1] by image dimensions:
x_center= (left + right) / 2 / image_widthy_center= (top + bottom) / 2 / image_heightwidth= (right - left) / image_widthheight= (bottom - top) / image_height
A typical label file:
0 0.4860 0.6312 0.1800 0.4150
2 0.7950 0.5400 0.0900 0.1300That's two objects: a class-0 (large, lower-left) and a class-2 (smaller, upper-right).
A label of 0 487 312 180 415 (raw pixels) silently trains the model to predict garbage. The first epoch will look fine, val mAP will be 0, and you'll lose an afternoon. If your numbers are bigger than 1, you forgot to normalize.
Convert from common formats
Most labeling tools export COCO JSON (see the COCO dataset) or Pascal VOC XML. The COCO JSON training guide explains the common path, the JSON2YOLO repo has scripts for the common formats, and tools like Roboflow export YOLO format directly. Or roll a one-off:
import xml.etree.ElementTree as ET
from pathlib import Path
CLASSES = ["forklift", "person", "pallet"]
def voc_to_yolo(xml_path: Path, out_dir: Path):
tree = ET.parse(xml_path)
root = tree.getroot()
w = int(root.find("size/width").text)
h = int(root.find("size/height").text)
lines = []
for obj in root.findall("object"):
name = obj.find("name").text
if name not in CLASSES:
continue
cls_id = CLASSES.index(name)
b = obj.find("bndbox")
x1, y1 = float(b.find("xmin").text), float(b.find("ymin").text)
x2, y2 = float(b.find("xmax").text), float(b.find("ymax").text)
xc = (x1 + x2) / 2 / w
yc = (y1 + y2) / 2 / h
bw = (x2 - x1) / w
bh = (y2 - y1) / h
lines.append(f"{cls_id} {xc:.6f} {yc:.6f} {bw:.6f} {bh:.6f}")
out_dir.joinpath(xml_path.stem + ".txt").write_text("\n".join(lines))A starter dataset to practice on
Ultralytics ships several "tiny" datasets you can use as templates: coco8, coco128, african-wildlife, crack-seg. They're small enough to download and inspect, big enough to actually train on:
# Auto-downloads the first time
yolo predict model=yolo26n.pt data=coco8.yamlAfter running, look at ~/datasets/coco8/ to see the exact directory layout — useful as a reference when you set up your own.
Sanity-check before training
Before you train, eyeball your labels — a quick pass of data cleaning here saves epochs later, and the trainer will add data augmentation on top of whatever you ship:
yolo predict model=yolo26n.pt source=my_dataset/images/train save_txt=TrueOr write a quick script that draws every label box on its image and saves a 4×4 grid. Eyes catch things metrics never will — a class shifted by one, half the boxes drawn around shadows, normalization that flipped y and x.
Take 10 of your project's images and convert them into the YOLO layout. Open one label file and confirm all numbers are between 0 and 1. Open the corresponding image and confirm the boxes are sensible.
Your dataset has the correct
images/,labels/,train/,val/layout.All label coordinates are normalized to [0, 1].
Image and label filenames match (without extensions).
Show solution
0 0.4860 0.6312 0.1800 0.4150ls my_dataset/images/train | head -5
ls my_dataset/labels/train | head -5
head -1 my_dataset/labels/train/000001.txt # all numbers should be < 1Folders and files in place. Now we tell YOLO about them with a single YAML.