Pose Estimation and OBB

Two specialized tasks and the kinds of problems where they shine.

The three core tasks — classification, detection, segmentation — handle 80% of CV work. The other 20% is where pose estimation and OBB live. Both are easy to dismiss until you hit the exact problem they were made for — like workouts monitoring or aerial inspection — and then nothing else works.

Outcome

Recognize when posture or rotation is the actual signal, and reach for pose or OBB instead of forcing a detector to do their job.

Fast Track

If you already know your way around, here's the short version.

Pose = keypoints (joints, fixed object points) — when posture or pose is the prediction.
OBB = rotated rectangles — when objects sit at any angle relative to the frame.
Both come with their own labels — keypoints or rotated polygons — that aren't interchangeable with boxes.

Hands-on

Pose estimation

Ultralytics YOLO pose estimation with keypoints

A pose model returns a list of (x, y) keypoints per detected subject — usually a person, but it works for animals or rigid objects too. The COCO-Pose Person model has 17 keypoints (eyes, ears, shoulders, elbows, wrists, hips, knees, ankles), and YOLO26 pose is the recommended Ultralytics default.

You don't use pose to answer "is there a person here?" — that's a detection question. You use it when the answer depends on what the person is doing, a form of action recognition:

Fall detection: a body lying horizontally with low keypoint variance.
Workplace ergonomics: how often a worker bends at the back rather than the knees.
Sport analysis: stride length, swing angle, jump height.
Animal welfare: posture changes that suggest distress in livestock.
Sign detection / hand tracking: finger positions over time.

results = model("athlete.jpg")
keypoints = results[0].keypoints.xy  # tensor: [n_people, 17, 2]
for person in keypoints:
    nose, left_shoulder, right_shoulder = person[0], person[5], person[6]
    print("nose:", nose.tolist(), "shoulders:", left_shoulder.tolist(), right_shoulder.tolist())

Pose is detection plus keypoints

A pose model still returns boxes — it just adds keypoints inside each box. If you don't need the keypoints, a vanilla detector is faster and cheaper.

Oriented bounding boxes (OBB)

A regular box has four numbers: corners. An OBB has five: center, width, height, and angle. That extra angle is the whole point — see the DOTAv2 dataset for canonical aerial OBB examples.

OBB shines when objects don't sit parallel to the image frame:

Aerial / satellite imagery: ships, aircraft on tarmacs, vehicles in parking lots — most are at random angles when seen from above.
Top-down conveyor systems: packages oriented however they fell on the belt.
Document analysis: photographed text and forms that aren't perfectly aligned.
Inventory in warehouses: crates and pallets at angles to the camera.

The reason axis-aligned detectors struggle here isn't accuracy — it's that the box hull badly overstates the object. A 30-meter ship at 45° fills a box twice the area of the ship itself, with most of the box being water. NMS can't separate two ships if their hulls overlap by 60% — even if the actual ships don't touch.

Use case	Axis-aligned	OBB
Counting people on a sidewalk	✅	overkill
Counting docked ships from above	⚠️ NMS confuses neighbors	✅
Detecting cars in a city street	✅	overkill
Detecting cars in a parking lot from above	⚠️	✅
Reading rotated invoices	⚠️	✅

Choosing a specialized task

The decision is usually clear once you ask the right question:

Is the angle of the object meaningful? → OBB.
Is the posture of the subject the prediction? → Pose.
If neither — stick with detection or segmentation, optionally with object tracking for cross-frame identity.

Try It

Find a real example for each: an aerial photo where OBB would help, a video clip where pose would help. Sketch the difference in expected output between OBB / pose and a regular detector for those examples.

Done When

You've finished the lesson when all of these are true.

You can name two scenarios where OBB earns its keep.
You can name two scenarios where pose is the right answer.
You haven't reached for either when plain detection would do.

What's next

We've picked tasks. Next: the data those tasks live on. Datasets break more projects than models do.

Get Started

Hands-on

Pose estimation

Oriented bounding boxes (OBB)

Choosing a specialized task