Skip to main content
Computer Vision Foundations·Pick a Task·Lesson 5/10
Lessonbeginner

Pose Estimation and OBB

Two specialized tasks and the kinds of problems where they shine.

Pose estimation keypoints on athletes

The three core tasks — classification, detection, segmentation — handle 80% of CV work. The other 20% is where pose estimation and OBB live. Both are easy to dismiss until you hit the exact problem they were made for — like workouts monitoring or aerial inspection — and then nothing else works.

Outcome

Recognize when posture or rotation is the actual signal, and reach for pose or OBB instead of forcing a detector to do their job.

Fast Track
If you already know your way around, here's the short version.
  1. Pose = keypoints (joints, fixed object points) — when posture or pose is the prediction.

  2. OBB = rotated rectangles — when objects sit at any angle relative to the frame.

  3. Both come with their own labels — keypoints or rotated polygons — that aren't interchangeable with boxes.

Hands-on

Pose estimation

Ultralytics YOLO pose estimation with keypoints

A pose model returns a list of (x, y) keypoints per detected subject — usually a person, but it works for animals or rigid objects too. The COCO-Pose Person model has 17 keypoints (eyes, ears, shoulders, elbows, wrists, hips, knees, ankles), and YOLO26 pose is the recommended Ultralytics default.

You don't use pose to answer "is there a person here?" — that's a detection question. You use it when the answer depends on what the person is doing, a form of action recognition:

  • Fall detection: a body lying horizontally with low keypoint variance.
  • Workplace ergonomics: how often a worker bends at the back rather than the knees.
  • Sport analysis: stride length, swing angle, jump height.
  • Animal welfare: posture changes that suggest distress in livestock.
  • Sign detection / hand tracking: finger positions over time.
results = model("athlete.jpg")
keypoints = results[0].keypoints.xy  # tensor: [n_people, 17, 2]
for person in keypoints:
    nose, left_shoulder, right_shoulder = person[0], person[5], person[6]
    print("nose:", nose.tolist(), "shoulders:", left_shoulder.tolist(), right_shoulder.tolist())
Pose is detection plus keypoints

A pose model still returns boxes — it just adds keypoints inside each box. If you don't need the keypoints, a vanilla detector is faster and cheaper.

Oriented bounding boxes (OBB)

A regular box has four numbers: corners. An OBB has five: center, width, height, and angle. That extra angle is the whole point — see the DOTAv2 dataset for canonical aerial OBB examples.

OBB shines when objects don't sit parallel to the image frame:

  • Aerial / satellite imagery: ships, aircraft on tarmacs, vehicles in parking lots — most are at random angles when seen from above.
  • Top-down conveyor systems: packages oriented however they fell on the belt.
  • Document analysis: photographed text and forms that aren't perfectly aligned.
  • Inventory in warehouses: crates and pallets at angles to the camera.

The reason axis-aligned detectors struggle here isn't accuracy — it's that the box hull badly overstates the object. A 30-meter ship at 45° fills a box twice the area of the ship itself, with most of the box being water. NMS can't separate two ships if their hulls overlap by 60% — even if the actual ships don't touch.

Use caseAxis-alignedOBB
Counting people on a sidewalkoverkill
Counting docked ships from above⚠️ NMS confuses neighbors
Detecting cars in a city streetoverkill
Detecting cars in a parking lot from above⚠️
Reading rotated invoices⚠️

Choosing a specialized task

The decision is usually clear once you ask the right question:

  1. Is the angle of the object meaningful? → OBB.
  2. Is the posture of the subject the prediction? → Pose.
  3. If neither — stick with detection or segmentation, optionally with object tracking for cross-frame identity.
Try It

Find a real example for each: an aerial photo where OBB would help, a video clip where pose would help. Sketch the difference in expected output between OBB / pose and a regular detector for those examples.

Done When
You've finished the lesson when all of these are true.
  • You can name two scenarios where OBB earns its keep.

  • You can name two scenarios where pose is the right answer.

  • You haven't reached for either when plain detection would do.

What's next

We've picked tasks. Next: the data those tasks live on. Datasets break more projects than models do.