Computer Vision Foundations

Build the vocabulary, intuition, and first hands-on reflexes for modern computer vision — tasks, datasets, metrics, and what they mean for a real product.

By Ultralytics Academy

Begin course Sign in to save progress

What you'll learn

Frame a computer vision problem, choose the right task and labels, evaluate a model honestly, and know what "good" looks like before you train.

Match a product need to the right vision task — classification, detection, segmentation, pose, or OBB.
Decide between bounding boxes and segmentation masks based on shape, cost, and downstream use.
Plan datasets and annotation guidelines that match deployment conditions.
Read precision, recall, and mAP together with examples to find the next improvement.
Recognize the difference between training, validation, and test data — and why people get this wrong.

What you'll build

A one-page task spec for a vision project: what the model predicts, what it ignores, what shape the output takes.
A dataset plan that lists splits, edge cases, label rules, and review criteria.
A small evaluation worksheet that converts metrics into a decision: ship, iterate, or rethink.

Prerequisites

Basic Python familiarity is helpful but not required — this course is concept-first.
Comfort installing software locally if you want to follow the optional ultralytics install in lesson 10.

Course content

4 modules · 10 lessons

Module 1

Frame the Problem

Turn a fuzzy product idea into a single sentence the model can answer.

Module 2

Pick a Task

The Five Vision Tasks

Match an output sentence to classification, detection, segmentation, pose, or oriented bounding boxes.

Understand Object Detection

Boxes, classes, and confidence — what detection actually returns and what it does not.

Boxes vs Masks: When You Actually Need Segmentation

Move from rectangles to pixels — and only when the shape genuinely matters.

Pose Estimation and OBB

Two specialized tasks and the kinds of problems where they shine.

Module 3

Datasets and Labels

Datasets that Reflect Reality

The model only knows what's in the dataset. Build the dataset like the model depends on it.

Annotation Quality

Half a million boxes are worse than fifty thousand consistent ones.

Module 4

Read the Metrics

Splits that Tell the Truth

Why training, validation, and test splits exist — and the leak that ruins half of them.

Reading Detection Metrics Honestly

Precision, recall, mAP — what they hide and how to look at them together.

From Lab to Production

The smallest end-to-end loop: install, predict, measure, decide.

Get Started

Course content