← Lessons

quiz vs the machine

Gold1390

Machine Learning

The Reproducibility Seeds

Control randomness so a result can be rerun and trusted.

5 min read · core · beat Gold to climb

Randomness everywhere

Training touches many random sources: weight init, data shuffling, dropout, and augmentation. To reproduce a run you must fix the seeds that drive them all.

  • Set seeds for the language, the array library, and the framework.
  • Control data ordering and split seeds.
  • Record library versions and hardware where relevant.

Seeds are not the whole story

A fixed seed makes one run repeatable, but a single seed can also mislead. A good score on seed 42 may be luck.

  • Report results across several seeds with mean and spread.
  • Distinguish a real gain from seed noise.
  • Note that some GPU operations remain nondeterministic.

What to pin

Pinning seeds plus versions makes a run rerunnable.

Key idea

Fixing seeds across the language, libraries, and framework makes a run repeatable, but report results over several seeds so you can tell a real improvement from random variation.

Check yourself

Answer to earn rating on the learn ladder.

1. Why fix random seeds before a training run?

2. Why report results over multiple seeds?