Skip to main content
Ultralytics YOLO in Production·Choose a Runtime·Lesson 2/9
Lessonintermediate

Export to ONNX (and Verify Parity)

The portable starting point — and how to confirm the export didn't quietly break the model.

ONNX is the lingua franca of model deployment. Even if you eventually move to TensorRT or CoreML, ONNX export is usually the cleanest first step — most other formats are derived from it. The non-trivial part isn't exporting; it's verifying that the exported model still gives the same answers.

Outcome

Export to ONNX with appropriate options, run the same inference through the .pt and .onnx files, and confirm parity within tolerance.

Fast Track
If you already know your way around, here's the short version.
  1. model.export(format='onnx', dynamic=True, simplify=True).

  2. Re-run a known image through both .pt and .onnx. Counts and confidences should match closely.

  3. Run yolo val on the .onnx file. mAP should be within 0.5 of the .pt.

  4. If parity fails, check opset, dynamic shapes, and NMS settings.

Hands-on

Export

Ultralytics Platform model export format list

from ultralytics import YOLO

model = YOLO("runs/detect/forklift_v1/weights/best.pt")
model.export(
    format="onnx",
    dynamic=True,        # variable batch + image size at inference time
    simplify=True,       # run onnx-simplifier on the graph
    opset=17,            # broadly supported by onnxruntime
    half=False,          # keep FP32 for the parity check; switch to FP16 later
)

That writes runs/detect/forklift_v1/weights/best.onnx. The output has end-to-end NMS baked in by default — meaning the exported file does the full pipeline, not just the backbone.

What each option does

OptionDefaultWhen to change
dynamicFalseSet True if batch size or image size varies at inference
simplifyTrueLeave on; onnx-simplifier removes redundant nodes
opsetlatest stableLower (12–15) only for older runtimes that don't support newer ops
halfFalseTrue for FP16 — faster, slight accuracy drop (mixed precision)
int8FalseINT8 quantization — see lesson 3
nmsTrueFalse only if you want to do NMS yourself in your runtime

Run the parity check

Don't trust the export. Run a known image through both:

from ultralytics import YOLO

pt = YOLO("runs/detect/forklift_v1/weights/best.pt")
onnx = YOLO("runs/detect/forklift_v1/weights/best.onnx")

img = "test_image.jpg"
pt_result = pt(img, conf=0.25)[0]
onnx_result = onnx(img, conf=0.25)[0]

print(f"PT  : {len(pt_result.boxes)} detections")
print(f"ONNX: {len(onnx_result.boxes)} detections")

# Compare top-5 confidences
pt_conf = sorted([float(b.conf) for b in pt_result.boxes], reverse=True)[:5]
onnx_conf = sorted([float(b.conf) for b in onnx_result.boxes], reverse=True)[:5]
print(f"Top-5 PT  : {pt_conf}")
print(f"Top-5 ONNX: {onnx_conf}")

You should see:

  • The same detection count (or off by ±1 from NMS rounding).
  • Confidences within ~0.01 of each other.
  • Same class predictions in roughly the same boxes.

Run validation on the ONNX file

The strongest parity check is full validation, optionally swept across formats with Benchmark mode:

yolo val model=runs/detect/forklift_v1/weights/best.onnx data=my_dataset/data.yaml

mAP@0.5:0.95 should be within 0.5 points of the PyTorch model. More than that and something is wrong.

Common parity failures
  • Wrong opset. If your runtime doesn't support a newer op, export silently uses a less precise replacement. Set opset to the runtime's max supported version.
  • Dynamic shape disagreement. If you exported with dynamic=False and the runtime feeds different sizes, NMS calibration is off. Re-export with dynamic=True.
  • NMS settings drift. conf and iou are baked into the export when nms=True. Make sure the values match what the runtime will use.

Run with onnxruntime directly

Sometimes you want to skip the Ultralytics wrapper and call the inference engine yourself:

import onnxruntime as ort
import numpy as np
from PIL import Image

session = ort.InferenceSession(
    "best.onnx",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"],
)

img = Image.open("test.jpg").resize((640, 640))
arr = np.array(img).transpose(2, 0, 1).astype(np.float32) / 255.0
arr = arr[None]    # NCHW

outputs = session.run(None, {"images": arr})
print([o.shape for o in outputs])

This is what your production runtime will actually do — the Ultralytics wrapper is for convenience, not the fast path.

Try It

Export your model to ONNX. Run yolo val on both the .pt and the .onnx. Confirm mAP is within 0.5. If it's not, change opset to 13 and re-export.

Done When
You've finished the lesson when all of these are true.
  • You have a best.onnx next to your best.pt.

  • Detection counts and confidences match within tolerance.

  • Validation mAP on the ONNX is within 0.5 of the PyTorch model.

Show solution
from ultralytics import YOLO

ckpt = "runs/detect/forklift_v1/weights/best.pt"
model = YOLO(ckpt)
model.export(format="onnx", dynamic=True, simplify=True, opset=17)

onnx_path = ckpt.replace(".pt", ".onnx")
metrics_pt = YOLO(ckpt).val()
metrics_onnx = YOLO(onnx_path).val()
print(f"PT   mAP@0.5:0.95 = {metrics_pt.box.map:.3f}")
print(f"ONNX mAP@0.5:0.95 = {metrics_onnx.box.map:.3f}")
print(f"Δ = {abs(metrics_pt.box.map - metrics_onnx.box.map):.3f}")
What's next

Now we'll squeeze the ONNX into TensorRT and watch latency drop in half.