Object Tracking with ByteTrack and BoT-SORT
Persistent IDs across frames — the foundation of every counting, alerting, and analytics pipeline.
Object detection finds objects per frame. Object tracking — also called multi-object tracking — links those detections through time so the same physical object keeps the same ID. Once you have IDs, all the interesting downstream pipelines — counting, dwell time, trajectories, speed estimation — become trivial.
Run Ultralytics YOLO tracking with ByteTrack and BoT-SORT, choose between them, and use box.id in a real pipeline.
model.track(source, persist=True, tracker='bytetrack.yaml').Each box gets a stable
idacross frames.ByteTrack — fast, simple, great default.
BoT-SORT — slower, robust to occlusion (uses appearance features).
Hands-on
Link to this sectionWhat a tracker actually does#
![]()
A tracker maintains a set of "tracks" — hypotheses about real-world objects. Each frame's detections are matched to tracks based on:
- Motion prediction — where the track should be this frame (Kalman filter).
- Spatial overlap — IoU between predicted track location and detection.
- Appearance (BoT-SORT only) — visual features for re-identification, matching the detection back to its track even after occlusion.
Tracks that don't get matched for a few frames are deleted. New unmatched detections become new tracks.
Link to this sectionByteTrack: fast and robust#
from ultralytics import YOLO
model = YOLO("yolo26n.pt")
results = model.track(
"input.mp4",
persist=True,
tracker="bytetrack.yaml",
conf=0.25,
)ByteTrack's trick: it uses low-confidence detections for matching, then drops them from the output. This recovers tracks during partial occlusion when confidence dips temporarily.
Link to this sectionBoT-SORT: appearance-aware#
results = model.track(
"input.mp4",
persist=True,
tracker="botsort.yaml",
)BoT-SORT extracts an appearance embedding per detection and uses it for re-identification when objects briefly disappear (behind a tree, pole, another object). It's slower — sometimes 30%+ — but if your scene has frequent occlusions, the difference in ID consistency is dramatic.
| Choose | When |
|---|---|
| ByteTrack | Open scenes, simple paths, throughput matters |
| BoT-SORT | Crowded scenes, frequent occlusions, IDs must persist |
Link to this sectionUsing IDs in a pipeline#
from collections import defaultdict
from ultralytics import YOLO
model = YOLO("yolo26n.pt")
counts = defaultdict(set)
for r in model.track("input.mp4", stream=True, persist=True, tracker="bytetrack.yaml"):
for box in r.boxes:
if box.id is None:
continue
cls = r.names[int(box.cls)]
counts[cls].add(int(box.id))
for cls, ids in counts.items():
print(f"{cls}: {len(ids)} unique objects")That's the whole pipeline. Tracking gives you stable IDs; counting unique IDs gives you accurate counts even on long videos. Built-in solutions like region counting, heatmaps, and distance calculation are all thin layers on top of this loop.
Link to this sectionTuning the tracker#
The tracker config files (bytetrack.yaml, botsort.yaml) live in the Ultralytics package. Common knobs:
track_buffer— how many frames a track survives without a match (default ~30 frames). Larger = more robust to occlusion, more zombie tracks.match_thresh— IoU threshold for matching detection to track. Lower if your motion is fast and the box jumps a lot.new_track_thresh— confidence floor for starting a new track. Raise it if you're getting too many short-lived false tracks.
If a stationary object's ID jumps every few frames, the tracker is creating new tracks instead of matching old ones. Check match_thresh (lower it) and track_buffer (raise it).
Link to this sectionStream sources#
Same API for live cameras — the Predict mode docs cover every supported source type:
for r in model.track("rtsp://camera.local/stream", stream=True, persist=True):
for box in r.boxes:
# do something with box.id, box.cls, box.xyxy
passJust remember: stream=True is mandatory for live or long sources, otherwise memory grows unbounded.
Run tracking with both bytetrack.yaml and botsort.yaml on the same video. Compare the saved outputs visually — find a moment where one keeps an ID and the other loses it. That's the difference between the two trackers.
You can write a tracking loop that uses
box.id.You've watched the difference between ByteTrack and BoT-SORT on real video.
You can name a scene type that warrants BoT-SORT and one that doesn't.
Show solution
from collections import defaultdict
from ultralytics import YOLO
model = YOLO("runs/detect/forklift_v1/weights/best.pt")
counts = defaultdict(set)
for r in model.track(
source="input.mp4",
persist=True,
stream=True,
tracker="botsort.yaml",
conf=0.4,
):
for box in r.boxes:
if box.id is None:
continue
counts[r.names[int(box.cls)]].add(int(box.id))
for cls, ids in counts.items():
print(f"{cls}: {len(ids)} unique objects")With persistent IDs in hand, we can build the most common production pipelines: counts, heatmaps, and speeds.