← Lessons

quiz vs the machine

Platinum1720

Algorithms

Reservoir Sampling

Pick a uniform random sample from a stream of unknown length.

5 min read · advanced · beat Platinum to climb

Sampling from an unbounded stream

Suppose items arrive one at a time and you must keep a uniform random sample of fixed size, but you never learn the total count in advance and cannot store everything. Reservoir sampling solves this with a small fixed buffer called the reservoir, deciding on the fly whether each new item replaces an existing one.

The single item case

To keep just one item uniformly, hold the first item, then for the item at position k replace the held item with probability one over k.

  • The first item starts as the sample.
  • The second replaces it with probability one half.
  • The kth item replaces the current sample with probability one over k.

A short induction shows that after processing n items, every item is held with probability exactly one over n, which is uniform.

Larger reservoirs

For a sample of size m, fill the reservoir with the first m items. Then for the item at position k beyond m, keep it with probability m over k, and if kept, evict a uniformly chosen current member. The same uniformity argument extends to size m.

Key idea

Replace reservoir members with shrinking probability as the stream grows, so every item ends up equally likely without ever knowing the total count.

Check yourself

Answer to earn rating on the learn ladder.

1. With what probability does the kth item replace a single held sample?

2. What makes reservoir sampling suited to streams?