Executing ahead of time
Modern processors do not run instructions strictly in program order. An out of order core fetches many instructions, finds ones whose inputs are ready, and executes them early to keep its many execution units busy while slower operations, like cache misses, are pending.
Keeping a single thread correct
Even though execution is scrambled, a single thread sees its own results in order. The core tracks dependencies and uses a reorder buffer to retire results in program order, so register and memory effects become architecturally visible in sequence for that thread.
- Instructions execute when operands are ready, not when fetched.
- Results retire in program order from the reorder buffer.
- Mispredicted speculative work is squashed before it becomes visible.
Why other threads still see reordering
The in order illusion holds only for the issuing thread. Other threads can observe the effects of memory operations in a different order, because stores drain through buffers and caches asynchronously. That is exactly why memory models and barriers exist: to restore ordering that out of order execution and store buffers otherwise hide.
Key idea
Out of order cores execute instructions early but retire them in program order for the issuing thread, while other threads can still observe reordered memory effects unless barriers intervene.