Skip to main content
Build with Ultralytics Platform·Deploy and Monitor·Lesson 8/10
Lessonintermediate

Monitor in Production

Watch latency, detection volume, and drift — and know when to retrain.

A deployed model is a fixed object in a moving world. Cameras get repositioned, lighting changes seasonally, new product variants appear. Without model monitoring you'll find out about the data drift when a customer complains. With the right monitoring — see the model monitoring & maintenance guide — you'll spot it weeks earlier.

Outcome

Use Platform's built-in monitoring to detect latency, accuracy, and drift regressions before they reach users.

Fast Track
If you already know your way around, here's the short version.
  1. Latency dashboard — alert on p95 above your budget.

  2. Detection volume — alert on sudden drops.

  3. Confidence distribution — KS test against baseline weekly.

  4. Holdout mAP — weekly cron, alert on drops > 1.0.

Hands-on

What Platform monitoring covers out of the box

Ultralytics Platform deploy page overview cards and world map

Every deployment has a dashboard with:

MetricCadenceWhat it catches
Request rateper secondTraffic spikes / drops
p50 / p95 / p99 latencyper minutePerformance regressions
Error rateper minuteAuth, payload, infra issues
Detection count per requestper requestDrift indicator
Mean confidenceper requestDrift indicator
Region distributionper dayWhere the traffic is coming from

Alerts you should configure

Platform's alert system fires on metric thresholds. The alerts that actually matter:

AlertThreshold
p95 latency > budget5 minutes consecutive
Error rate > 1%5 minutes consecutive
Detection rate drops > 50%10 minutes consecutive
Mean confidence shifts > 2σDaily
Holdout mAP drops > 1 pointWeekly

Each alert should map to a runbook entry — what does on-call do when this fires?

The holdout job

The single most important production check is the weekly holdout validation:

  1. You've reserved 200–500 production-realistic labeled images as a holdout (lesson 4).
  2. Platform schedules a weekly cron that runs validation against the current deployment.
  3. mAP@0.5:0.95 is logged and graphed over time — see the YOLO performance metrics guide for the math.
   mAP@0.5:0.95 over time
   0.62 │ ●●●●●●●●●●●●●
   0.60 │              ●●●●●●●
   0.58 │                     ●●  ← drift!
   0.56 │                       ●●●
        └──────────────────────────────▶  weeks

A clear downtrend over 3+ weeks is your early signal to investigate.

Drift signals

Two cheap drift signals you should plot weekly — both are also useful for upstream anomaly detection:

  1. Confidence histogram — production model confidences for the current week. Compare to the histogram from deployment week. If it shifts left, the input domain has drifted.
  2. Object size distribution — width × height of detected boxes. Camera repositioning, lens changes, or a new deployment site usually show up here first.

A KS test gives a p-value:

   week 1   ▁▂▄█▇▄▁
   week 12  ▁▃▆█▆▂▁    KS statistic 0.04, p = 0.31    ← stable
   week 24  ▂▅█▇▃▁▁    KS statistic 0.18, p < 0.01    ← drifted

When the p-value crosses your threshold (commonly 0.01), the dashboard turns yellow.

When to retrain

Drift signals don't always mean "retrain immediately." Common patterns:

PatternAction
Confidence shift, mAP holdingWatch — could be benign domain shift the model handles
Confidence shift + mAP dropRetrain. Real accuracy regression.
Detection volume drop, confidence stableInvestigate — might be a bug, a camera offline, or quieter scene
Latency regressionHardware / infra investigation, not the model

The retrain decision is also a data decision: what new data do you need? Sample drift-flagged frames into the labeling queue first.

A retraining loop in practice

Once you've done one retrain, codify it — this is the continual learning cycle in production form:

  1. Detect drift — alert fires.
  2. Sample drifted frames — pull recent production frames into Platform Discovery.
  3. Auto-annotate + review — using the previous best model as the inference pass, ranked by active learning uncertainty.
  4. Retrain — fresh run from previous best.pt with lr0=0.001 (lesson 8).
  5. Validate against holdout + recent frames — both must improve.
  6. Deploy with rollout — gradual traffic shift; monitor for regression.
  7. Lock the new baseline — update the holdout with newly labeled frames over time.

The whole loop, on Platform, takes hours of engineer time and days of training/validation. Without Platform — and without a coherent MLOps story — it would take a sprint of glue work.

Try It

Set up two alerts on your deployment: one on p95 latency, one on detection volume drop. Synthetically trigger them — send malformed images to spike errors, or pause traffic to drop detection volume. Confirm the alert fires.

Done When
You've finished the lesson when all of these are true.
  • Your deployment has at least two configured alerts.

  • You have a weekly holdout validation job running.

  • You can name your retraining trigger — what signal would cause you to retrain.

What's next

Last two lessons: regions/compliance, and the operational shape of a long-running CV project.