Meet YOLO26: next-gen vision AI.
Ultralytics YOLO in Production·Track and Build Pipelines·Lesson 5/9
Lessonintermediate

Object Tracking with ByteTrack and BoT-SORT

Persistent IDs across frames — the foundation of every counting, alerting, and analytics pipeline.

Object detection finds objects per frame. Object tracking — also called multi-object tracking — links those detections through time so the same physical object keeps the same ID. Once you have IDs, all the interesting downstream pipelines — counting, dwell time, trajectories, speed estimation — become trivial.

Outcome

Run Ultralytics YOLO tracking with ByteTrack and BoT-SORT, choose between them, and use box.id in a real pipeline.

Fast Track
If you already know your way around, here's the short version.
  1. model.track(source, persist=True, tracker='bytetrack.yaml').

  2. Each box gets a stable id across frames.

  3. ByteTrack — fast, simple, great default.

  4. BoT-SORT — slower, robust to occlusion (uses appearance features).

Hands-on

Link to this sectionWhat a tracker actually does#

YOLO multi-object tracking examples

A tracker maintains a set of "tracks" — hypotheses about real-world objects. Each frame's detections are matched to tracks based on:

  1. Motion prediction — where the track should be this frame (Kalman filter).
  2. Spatial overlap — IoU between predicted track location and detection.
  3. Appearance (BoT-SORT only) — visual features for re-identification, matching the detection back to its track even after occlusion.

Tracks that don't get matched for a few frames are deleted. New unmatched detections become new tracks.

Link to this sectionByteTrack: fast and robust#

from ultralytics import YOLO

model = YOLO("yolo26n.pt")
results = model.track(
    "input.mp4",
    persist=True,
    tracker="bytetrack.yaml",
    conf=0.25,
)

ByteTrack's trick: it uses low-confidence detections for matching, then drops them from the output. This recovers tracks during partial occlusion when confidence dips temporarily.

Link to this sectionBoT-SORT: appearance-aware#

results = model.track(
    "input.mp4",
    persist=True,
    tracker="botsort.yaml",
)

BoT-SORT extracts an appearance embedding per detection and uses it for re-identification when objects briefly disappear (behind a tree, pole, another object). It's slower — sometimes 30%+ — but if your scene has frequent occlusions, the difference in ID consistency is dramatic.

ChooseWhen
ByteTrackOpen scenes, simple paths, throughput matters
BoT-SORTCrowded scenes, frequent occlusions, IDs must persist

Link to this sectionUsing IDs in a pipeline#

from collections import defaultdict
from ultralytics import YOLO

model = YOLO("yolo26n.pt")

counts = defaultdict(set)

for r in model.track("input.mp4", stream=True, persist=True, tracker="bytetrack.yaml"):
    for box in r.boxes:
        if box.id is None:
            continue
        cls = r.names[int(box.cls)]
        counts[cls].add(int(box.id))

for cls, ids in counts.items():
    print(f"{cls}: {len(ids)} unique objects")

That's the whole pipeline. Tracking gives you stable IDs; counting unique IDs gives you accurate counts even on long videos. Built-in solutions like region counting, heatmaps, and distance calculation are all thin layers on top of this loop.

Link to this sectionTuning the tracker#

The tracker config files (bytetrack.yaml, botsort.yaml) live in the Ultralytics package. Common knobs:

  • track_buffer — how many frames a track survives without a match (default ~30 frames). Larger = more robust to occlusion, more zombie tracks.
  • match_thresh — IoU threshold for matching detection to track. Lower if your motion is fast and the box jumps a lot.
  • new_track_thresh — confidence floor for starting a new track. Raise it if you're getting too many short-lived false tracks.
ID flicker is usually a tuning problem

If a stationary object's ID jumps every few frames, the tracker is creating new tracks instead of matching old ones. Check match_thresh (lower it) and track_buffer (raise it).

Link to this sectionStream sources#

Same API for live cameras — the Predict mode docs cover every supported source type:

for r in model.track("rtsp://camera.local/stream", stream=True, persist=True):
    for box in r.boxes:
        # do something with box.id, box.cls, box.xyxy
        pass

Just remember: stream=True is mandatory for live or long sources, otherwise memory grows unbounded.

Try It

Run tracking with both bytetrack.yaml and botsort.yaml on the same video. Compare the saved outputs visually — find a moment where one keeps an ID and the other loses it. That's the difference between the two trackers.

Done When
You've finished the lesson when all of these are true.
  • You can write a tracking loop that uses box.id.

  • You've watched the difference between ByteTrack and BoT-SORT on real video.

  • You can name a scene type that warrants BoT-SORT and one that doesn't.

Show solution
from collections import defaultdict
from ultralytics import YOLO

model = YOLO("runs/detect/forklift_v1/weights/best.pt")
counts = defaultdict(set)

for r in model.track(
    source="input.mp4",
    persist=True,
    stream=True,
    tracker="botsort.yaml",
    conf=0.4,
):
    for box in r.boxes:
        if box.id is None:
            continue
        counts[r.names[int(box.cls)]].add(int(box.id))

for cls, ids in counts.items():
    print(f"{cls}: {len(ids)} unique objects")
What's next

With persistent IDs in hand, we can build the most common production pipelines: counts, heatmaps, and speeds.