Pose Estimation and OBB
Two specialized tasks and the kinds of problems where they shine.

The three core tasks — classification, detection, segmentation — handle 80% of CV work. The other 20% is where pose estimation and OBB live. Both are easy to dismiss until you hit the exact problem they were made for — like workouts monitoring or aerial inspection — and then nothing else works.
Recognize when posture or rotation is the actual signal, and reach for pose or OBB instead of forcing a detector to do their job.
Pose = keypoints (joints, fixed object points) — when posture or pose is the prediction.
OBB = rotated rectangles — when objects sit at any angle relative to the frame.
Both come with their own labels — keypoints or rotated polygons — that aren't interchangeable with boxes.
Hands-on
Pose estimation

A pose model returns a list of (x, y) keypoints per detected subject — usually a person, but it works for animals or rigid objects too. The COCO-Pose Person model has 17 keypoints (eyes, ears, shoulders, elbows, wrists, hips, knees, ankles), and YOLO26 pose is the recommended Ultralytics default.
You don't use pose to answer "is there a person here?" — that's a detection question. You use it when the answer depends on what the person is doing, a form of action recognition:
- Fall detection: a body lying horizontally with low keypoint variance.
- Workplace ergonomics: how often a worker bends at the back rather than the knees.
- Sport analysis: stride length, swing angle, jump height.
- Animal welfare: posture changes that suggest distress in livestock.
- Sign detection / hand tracking: finger positions over time.
results = model("athlete.jpg")
keypoints = results[0].keypoints.xy # tensor: [n_people, 17, 2]
for person in keypoints:
nose, left_shoulder, right_shoulder = person[0], person[5], person[6]
print("nose:", nose.tolist(), "shoulders:", left_shoulder.tolist(), right_shoulder.tolist())A pose model still returns boxes — it just adds keypoints inside each box. If you don't need the keypoints, a vanilla detector is faster and cheaper.
Oriented bounding boxes (OBB)
A regular box has four numbers: corners. An OBB has five: center, width, height, and angle. That extra angle is the whole point — see the DOTAv2 dataset for canonical aerial OBB examples.
OBB shines when objects don't sit parallel to the image frame:
- Aerial / satellite imagery: ships, aircraft on tarmacs, vehicles in parking lots — most are at random angles when seen from above.
- Top-down conveyor systems: packages oriented however they fell on the belt.
- Document analysis: photographed text and forms that aren't perfectly aligned.
- Inventory in warehouses: crates and pallets at angles to the camera.
The reason axis-aligned detectors struggle here isn't accuracy — it's that the box hull badly overstates the object. A 30-meter ship at 45° fills a box twice the area of the ship itself, with most of the box being water. NMS can't separate two ships if their hulls overlap by 60% — even if the actual ships don't touch.
| Use case | Axis-aligned | OBB |
|---|---|---|
| Counting people on a sidewalk | ✅ | overkill |
| Counting docked ships from above | ⚠️ NMS confuses neighbors | ✅ |
| Detecting cars in a city street | ✅ | overkill |
| Detecting cars in a parking lot from above | ⚠️ | ✅ |
| Reading rotated invoices | ⚠️ | ✅ |
Choosing a specialized task
The decision is usually clear once you ask the right question:
- Is the angle of the object meaningful? → OBB.
- Is the posture of the subject the prediction? → Pose.
- If neither — stick with detection or segmentation, optionally with object tracking for cross-frame identity.
Find a real example for each: an aerial photo where OBB would help, a video clip where pose would help. Sketch the difference in expected output between OBB / pose and a regular detector for those examples.
You can name two scenarios where OBB earns its keep.
You can name two scenarios where pose is the right answer.
You haven't reached for either when plain detection would do.
We've picked tasks. Next: the data those tasks live on. Datasets break more projects than models do.