Run Inference on Video
From single-image predict to real-time video — and what changes when frames are continuous.
A model that detects in a still image is interesting. A model that detects in video, frame by frame, is what people actually deploy. Video adds two new questions: speed (can you keep up?) and continuity — object tracking keeps that same car in frame 30 and frame 31 sharing an ID. We'll handle both.
Run Ultralytics YOLO on a video file or a webcam stream, save annotated output, and use tracking to keep object IDs consistent across frames.
model('path/to/video.mp4', save=True)saves an annotated MP4.model.track(...)adds persistent object IDs across frames.Use
stream=Truefor live sources to avoid memory blowup.Pick a model size that runs at your target fps — see lesson 3.
Hands-on
A video file
Track mode accepts video paths the same way Predict mode accepts images:
from ultralytics import YOLO
model = YOLO("runs/detect/forklift_v1/weights/best.pt")
model("input.mp4", save=True, conf=0.4)
# Output: runs/detect/predict/input.mp4 (annotated)That decodes the video, runs each frame through the model, draws boxes, and writes a new MP4. Easy. Slow if the video is long.
Webcam or live stream
Pass an integer (webcam index) or an RTSP URL — for a polished UI, the Streamlit live inference guide wraps this same loop in a browser app:
model(0, stream=True, save=True) # default webcam
model("rtsp://camera.local/stream", stream=True, save=True) # IP camerastream=True is critical for live or long sources — it processes frames lazily as they arrive, instead of loading everything into memory.
Without stream=True, YOLO tries to read the entire source first. A 4-hour CCTV recording will exhaust RAM. stream=True returns a generator — you process one frame at a time.
Tracking: persistent IDs across frames
Detection alone gives you "there's a car here in frame 30, there's a car here in frame 31." It does not say the same car. Multi-object tracking (the broader umbrella covers analytics like object counting, heatmaps, speed estimation, and action recognition) does:
results = model.track(
"input.mp4",
save=True,
persist=True,
tracker="bytetrack.yaml", # or "botsort.yaml"
)Each box now has a box.id — an integer that follows the object across frames. Use it for counting (count unique IDs), trajectories, or "this object has been here for 3 seconds" logic.
| Tracker | Speed | Robustness |
|---|---|---|
bytetrack.yaml | Faster, simpler | Loses re-identifications when objects briefly disappear |
botsort.yaml | Slower | Better at recovering IDs after occlusion (uses appearance features) |
A frame-by-frame loop
When you need custom logic (count, alert, draw extra overlays), drop into the generator:
from ultralytics import YOLO
from collections import defaultdict
model = YOLO("runs/detect/forklift_v1/weights/best.pt")
per_class_unique_ids = defaultdict(set)
for frame_results in model.track("input.mp4", stream=True, persist=True):
for box in frame_results.boxes:
if box.id is None:
continue
cls_name = frame_results.names[int(box.cls)]
per_class_unique_ids[cls_name].add(int(box.id))
for cls, ids in per_class_unique_ids.items():
print(f"{cls}: {len(ids)} unique objects seen")Performance: keep up with the source
Your effective fps = min(decode fps, model inference fps, write fps). On a modern GPU, model fps is rarely the bottleneck for yolo26n/s; for larger models, it can be.
If you can't keep up:
- Smaller model.
novers, or quantize to INT8 (next lesson). - Lower
imgsz. 480 is much faster than 640, modest accuracy hit. - Skip frames. Detect every Nth frame, interpolate boxes between detections (cheap if your tracker handles it).
- GPU. Two orders of magnitude over CPU for the same model — pushing the model to a TensorRT engine on edge devices is the usual next step.
Run model.track('any_video.mp4', save=True) on a real video. Open the result and watch the boxes have persistent IDs through occlusions. If you see ID flicker, switch the tracker from bytetrack.yaml to botsort.yaml.
You've run inference on a video file and saved an annotated output.
You've used
model.track(...)and confirmedbox.idis set.You know your effective fps and which knob would speed it up if needed.
Show solution
from ultralytics import YOLO
model = YOLO("runs/detect/forklift_v1/weights/best.pt")
results = model.track(
source="input.mp4",
save=True,
persist=True,
tracker="botsort.yaml",
conf=0.4,
)Last lesson: take the trained model out of Python and into the runtime your product actually uses.