Why one run is not enough
A correct looking concurrent function may fail only under a specific interleaving. A single test almost always hits the easy schedule and passes.
How stress testing helps
A stress test runs the same operations across many threads for many iterations, hoping the scheduler eventually produces a dangerous interleaving:
- raise thread counts above core counts to force context switches
- repeat the test thousands of times
- add random short delays to vary timing
The more interleavings you sample, the higher the chance of catching a latent bug.
Its limits
Stress testing is probabilistic. It can raise confidence but never proves correctness, and a rare bug may still slip through because the harmful schedule was never sampled.
Key idea
Stress testing samples many interleavings by piling on threads and iterations, raising the odds of hitting a rare bug while never proving its absence.