Skip to main content
Lessonbeginner

Define the Dataset Specification

Plan the dataset on paper before a single image is collected — scenarios, environments, edge cases, and negatives.

A dataset specification is the bridge between the business objective and the camera. It enumerates what visual variation must appear in the dataset for the model to be deployable. Skipping this step is how teams end up with 5,000 well-lit warehouse photos and a model that breaks at dusk on day one. The data collection and annotation guide covers the underlying class-count and bias decisions in depth.

Outcome

Produce a written dataset specification listing target classes, scenarios, environments, edge cases, and negative examples — with rough quotas.

Fast Track
If you already know your way around, here's the short version.
  1. Enumerate scenarios: each combination of camera × location × time × condition.

  2. Set quantitative targets: ≥1500 images / class, ≥10000 labeled instances / class.

  3. Plan for edge cases up front: occluded, partial, small, stacked, unusual orientations.

  4. Reserve 0–10% background images (no objects) — reduces false positives.

Hands-on

What a dataset spec looks like

Ultralytics Platform dataset upload dialog

The dataset spec slots into the wider lifecycle covered in steps of a CV project — between the business objective (lesson 1) and the camera:

graph LR
    A[Business<br/>objective] --> B[Vision task<br/>+ classes]
    B --> C[Dataset<br/>specification]
    C --> D[Collect]
    D --> E[Label]
    E --> F[Split + QC]
    F --> G[Fine-tune]
    G --> H{Meets<br/>metric?}
    H -- no --> C
    H -- yes --> I[Deploy]

    style C fill:#FF9800,color:#fff
    style F fill:#2196F3,color:#fff
    style G fill:#9C27B0,color:#fff
    style I fill:#4CAF50,color:#fff

A working spec is a table — one row per scenario. A scenario is a unique combination of capture variables that the model must handle:

   scenario      camera     location      time-of-day   weather       quota   notes
   ─────────────────────────────────────────────────────────────────────────────────────
   dock-day      cam-1..4   bay-A,B,C     08:00–17:00   any           1200    busiest hours
   dock-night    cam-1..4   bay-A,B,C     22:00–04:00   any            400    fewer ops, harder lighting
   loading-rain  cam-2,3    bay-B          all          rain only      200    rare but critical
   stacked       any        any            any          any            150    pallets stacked > 2 high
   negatives     any        any            any          any            150    empty docks, no targets

Scenarios make it concrete. A scenario you can't write down is a scenario the model won't see in training and will fail on in production.

Quantity targets that actually predict good results

The Ultralytics Tips for Best Training Results recommend, as a rule of thumb for production:

TargetRecommendation
Images per class≥ 1500
Labeled instances per class≥ 10,000
Background images (no labels)0–10% of total — reduces false positives
VarianceDifferent times, seasons, weather, lighting, angles, sources, cameras

Smaller datasets work — narrow tasks (one class, one camera) often hit acceptable accuracy with 200–500 images. But for production, 1500 images / 10,000 instances / class is the line below which you should expect to need active learning rounds before the model is shippable.

The variance dimensions

For every dimension below, write down whether your dataset spec covers it. Empty cells are scenarios you'll discover in production:

DimensionExamples
TimeDay / night / dawn / dusk
WeatherSunny / cloudy / rain / fog / snow
LightingBright / dim / mixed / glare / shadow
CameraMake, model, lens, mounting height, FoV
GeographySite A / site B / different cities / countries
OperatorsDifferent shifts, different teams, different uniforms
Product variantsDifferent SKUs, packaging revisions, color variants
Object sizeNear (large in frame), mid, far (small)
OcclusionClear / partial / heavy / behind glass
Failure modesMislabeled boxes, broken pallets, dirty cameras

Variance matters more than volume. 5,000 images from one camera at noon are worth less than 1,500 images that span the table above. The data collection and annotation guide goes deeper on diverse sourcing.

Don't forget background and edge cases

Two categories that always look optional and always come back to bite teams:

  1. Background images — frames with no labeled objects. They teach the model "nothing is here." Aim for 0–10% of the dataset (COCO has ~1%). Without them, the model invents detections in empty scenes.
  2. Edge cases — heavy occlusion, partial objects at frame edges, unusual orientations, stacked objects, look-alikes (a forklift's mast vs. a column). Reserve a quota line for each edge case in the spec.
   common               edge cases
   ┌──────────────┐     ┌──────────────┐
   │ ▪▪▪▪▪▪▪▪▪▪▪▪ │     │ ░ ▪ ░ ░ ▪ ░ │   ← rare but
   │ ▪▪▪▪▪▪▪▪▪▪▪▪ │     │ ░ ░ ░ ▪ ░ ░ │     diagnostic
   │ (1500+ each) │     │ ░ ▪ ░ ░ ░ ▪ │
   └──────────────┘     └──────────────┘
   model trains on      model fails on
   these by default     these in prod

A good spec over-samples edge cases on purpose — they're rare in the wild, so a stratified collection makes them visible during training.

Try It

Draft the dataset spec for your project as a table: 5–10 scenario rows, with quotas. Show it to someone who knows the deployment site (a foreman, a shift lead, a customer). They'll add 2–3 scenarios you missed — that's the value.

Done When
You've finished the lesson when all of these are true.
  • Your spec lists ≥ 5 scenarios with explicit quotas.

  • You've enumerated variance across at least 5 of the 10 dimensions above.

  • You've reserved a row for background images.

  • You've reserved at least one row per identified edge case.

What's next

Plan in hand. Next: actually collect images that match it.