← Lessons

quiz vs the machine

Gold1380

Machine Learning

Sampling Techniques

Choosing a representative subset without distorting the signal.

5 min read · core · beat Gold to climb

Sampling Techniques

When data is too large or too skewed, you work with a sample. How you draw that sample shapes everything the model learns, so the method matters.

Common methods

  • Random sampling picks examples uniformly and is simple, but rare groups may vanish.
  • Stratified sampling splits the population into groups and samples each so proportions are preserved.
  • Reservoir sampling draws a fixed size sample from a stream of unknown length in a single pass.

Avoiding bias

A sample is only useful if it is representative. Convenience samples, such as only the most recent or easiest to reach records, quietly bias the model. If you sample only daytime traffic, the model never learns nighttime behavior.

Sampling and evaluation

Sampling also matters at evaluation. A test set drawn from a different time period or population than production gives an optimistic and false read on quality. Keeping the sampling strategy explicit and documented lets others judge whether conclusions generalize.

Key idea

Sampling must produce a representative subset, and stratified or reservoir methods help when random sampling would drop rare groups or stream data.

Check yourself

Answer to earn rating on the learn ladder.

1. What does stratified sampling preserve?

2. When is reservoir sampling useful?