Object Tracking with ByteTrack and BoT-SORT
Persistent IDs across frames — the foundation of every counting, alerting, and analytics pipeline.
Object detection finds objects per frame. Object tracking — also called multi-object tracking — links those detections through time so the same physical object keeps the same ID. Once you have IDs, all the interesting downstream pipelines — counting, dwell time, trajectories, speed estimation — become trivial.
Run Ultralytics YOLO tracking with ByteTrack and BoT-SORT, choose between them, and use box.id in a real pipeline.
model.track(source, persist=True, tracker='bytetrack.yaml').Each box gets a stable
idacross frames.ByteTrack — fast, simple, great default.
BoT-SORT — slower, robust to occlusion (uses appearance features).
Hands-on
What a tracker actually does
![]()
A tracker maintains a set of "tracks" — hypotheses about real-world objects. Each frame's detections are matched to tracks based on:
- Motion prediction — where the track should be this frame (Kalman filter).
- Spatial overlap — IoU between predicted track location and detection.
- Appearance (BoT-SORT only) — visual features for re-identification, matching the detection back to its track even after occlusion.
Tracks that don't get matched for a few frames are deleted. New unmatched detections become new tracks.
ByteTrack: fast and robust
from ultralytics import YOLO
model = YOLO("yolo26n.pt")
results = model.track(
"input.mp4",
persist=True,
tracker="bytetrack.yaml",
conf=0.25,
)ByteTrack's trick: it uses low-confidence detections for matching, then drops them from the output. This recovers tracks during partial occlusion when confidence dips temporarily.
BoT-SORT: appearance-aware
results = model.track(
"input.mp4",
persist=True,
tracker="botsort.yaml",
)BoT-SORT extracts an appearance embedding per detection and uses it for re-identification when objects briefly disappear (behind a tree, pole, another object). It's slower — sometimes 30%+ — but if your scene has frequent occlusions, the difference in ID consistency is dramatic.
| Choose | When |
|---|---|
| ByteTrack | Open scenes, simple paths, throughput matters |
| BoT-SORT | Crowded scenes, frequent occlusions, IDs must persist |
Using IDs in a pipeline
from collections import defaultdict
from ultralytics import YOLO
model = YOLO("yolo26n.pt")
counts = defaultdict(set)
for r in model.track("input.mp4", stream=True, persist=True, tracker="bytetrack.yaml"):
for box in r.boxes:
if box.id is None:
continue
cls = r.names[int(box.cls)]
counts[cls].add(int(box.id))
for cls, ids in counts.items():
print(f"{cls}: {len(ids)} unique objects")That's the whole pipeline. Tracking gives you stable IDs; counting unique IDs gives you accurate counts even on long videos. Built-in solutions like region counting, heatmaps, and distance calculation are all thin layers on top of this loop.
Tuning the tracker
The tracker config files (bytetrack.yaml, botsort.yaml) live in the Ultralytics package. Common knobs:
track_buffer— how many frames a track survives without a match (default ~30 frames). Larger = more robust to occlusion, more zombie tracks.match_thresh— IoU threshold for matching detection to track. Lower if your motion is fast and the box jumps a lot.new_track_thresh— confidence floor for starting a new track. Raise it if you're getting too many short-lived false tracks.
If a stationary object's ID jumps every few frames, the tracker is creating new tracks instead of matching old ones. Check match_thresh (lower it) and track_buffer (raise it).
Stream sources
Same API for live cameras — the Predict mode docs cover every supported source type:
for r in model.track("rtsp://camera.local/stream", stream=True, persist=True):
for box in r.boxes:
# do something with box.id, box.cls, box.xyxy
passJust remember: stream=True is mandatory for live or long sources, otherwise memory grows unbounded.
Run tracking with both bytetrack.yaml and botsort.yaml on the same video. Compare the saved outputs visually — find a moment where one keeps an ID and the other loses it. That's the difference between the two trackers.
You can write a tracking loop that uses
box.id.You've watched the difference between ByteTrack and BoT-SORT on real video.
You can name a scene type that warrants BoT-SORT and one that doesn't.
Show solution
from collections import defaultdict
from ultralytics import YOLO
model = YOLO("runs/detect/forklift_v1/weights/best.pt")
counts = defaultdict(set)
for r in model.track(
source="input.mp4",
persist=True,
stream=True,
tracker="botsort.yaml",
conf=0.4,
):
for box in r.boxes:
if box.id is None:
continue
counts[r.names[int(box.cls)]].add(int(box.id))
for cls, ids in counts.items():
print(f"{cls}: {len(ids)} unique objects")With persistent IDs in hand, we can build the most common production pipelines: counts, heatmaps, and speeds.