Ultralytics YOLO in Production·Track and Build Pipelines·Lesson 5/9

Lessonintermediate

Object Tracking with ByteTrack and BoT-SORT

Persistent IDs across frames — the foundation of every counting, alerting, and analytics pipeline.

Object detection finds objects per frame. Object tracking — also called multi-object tracking — links those detections through time so the same physical object keeps the same ID. Once you have IDs, all the interesting downstream pipelines — counting, dwell time, trajectories, speed estimation — become trivial.

Outcome

Run Ultralytics YOLO tracking with ByteTrack and BoT-SORT, choose between them, and use box.id in a real pipeline.

Fast Track

If you already know your way around, here's the short version.

model.track(source, persist=True, tracker='bytetrack.yaml').
Each box gets a stable id across frames.
ByteTrack — fast, simple, great default.
BoT-SORT — slower, robust to occlusion (uses appearance features).

Hands-on

What a tracker actually does

YOLO multi-object tracking examples

A tracker maintains a set of "tracks" — hypotheses about real-world objects. Each frame's detections are matched to tracks based on:

Motion prediction — where the track should be this frame (Kalman filter).
Spatial overlap — IoU between predicted track location and detection.
Appearance (BoT-SORT only) — visual features for re-identification, matching the detection back to its track even after occlusion.

Tracks that don't get matched for a few frames are deleted. New unmatched detections become new tracks.

ByteTrack: fast and robust

from ultralytics import YOLO

model = YOLO("yolo26n.pt")
results = model.track(
    "input.mp4",
    persist=True,
    tracker="bytetrack.yaml",
    conf=0.25,
)

ByteTrack's trick: it uses low-confidence detections for matching, then drops them from the output. This recovers tracks during partial occlusion when confidence dips temporarily.

BoT-SORT: appearance-aware

results = model.track(
    "input.mp4",
    persist=True,
    tracker="botsort.yaml",
)

BoT-SORT extracts an appearance embedding per detection and uses it for re-identification when objects briefly disappear (behind a tree, pole, another object). It's slower — sometimes 30%+ — but if your scene has frequent occlusions, the difference in ID consistency is dramatic.

Choose	When
ByteTrack	Open scenes, simple paths, throughput matters
BoT-SORT	Crowded scenes, frequent occlusions, IDs must persist

Using IDs in a pipeline

from collections import defaultdict
from ultralytics import YOLO

model = YOLO("yolo26n.pt")

counts = defaultdict(set)

for r in model.track("input.mp4", stream=True, persist=True, tracker="bytetrack.yaml"):
    for box in r.boxes:
        if box.id is None:
            continue
        cls = r.names[int(box.cls)]
        counts[cls].add(int(box.id))

for cls, ids in counts.items():
    print(f"{cls}: {len(ids)} unique objects")

That's the whole pipeline. Tracking gives you stable IDs; counting unique IDs gives you accurate counts even on long videos. Built-in solutions like region counting, heatmaps, and distance calculation are all thin layers on top of this loop.

Tuning the tracker

The tracker config files (bytetrack.yaml, botsort.yaml) live in the Ultralytics package. Common knobs:

track_buffer — how many frames a track survives without a match (default ~30 frames). Larger = more robust to occlusion, more zombie tracks.
match_thresh — IoU threshold for matching detection to track. Lower if your motion is fast and the box jumps a lot.
new_track_thresh — confidence floor for starting a new track. Raise it if you're getting too many short-lived false tracks.

ID flicker is usually a tuning problem

If a stationary object's ID jumps every few frames, the tracker is creating new tracks instead of matching old ones. Check match_thresh (lower it) and track_buffer (raise it).

Stream sources

Same API for live cameras — the Predict mode docs cover every supported source type:

for r in model.track("rtsp://camera.local/stream", stream=True, persist=True):
    for box in r.boxes:
        # do something with box.id, box.cls, box.xyxy
        pass

Just remember: stream=True is mandatory for live or long sources, otherwise memory grows unbounded.

Try It

Run tracking with both bytetrack.yaml and botsort.yaml on the same video. Compare the saved outputs visually — find a moment where one keeps an ID and the other loses it. That's the difference between the two trackers.

Done When

You've finished the lesson when all of these are true.

You can write a tracking loop that uses box.id.
You've watched the difference between ByteTrack and BoT-SORT on real video.
You can name a scene type that warrants BoT-SORT and one that doesn't.

Show solution

from collections import defaultdict
from ultralytics import YOLO

model = YOLO("runs/detect/forklift_v1/weights/best.pt")
counts = defaultdict(set)

for r in model.track(
    source="input.mp4",
    persist=True,
    stream=True,
    tracker="botsort.yaml",
    conf=0.4,
):
    for box in r.boxes:
        if box.id is None:
            continue
        counts[r.names[int(box.cls)]].add(int(box.id))

for cls, ids in counts.items():
    print(f"{cls}: {len(ids)} unique objects")

What's next

With persistent IDs in hand, we can build the most common production pipelines: counts, heatmaps, and speeds.

Get Started