The lost request
In a microservice system, one user request might touch twenty services. When it is slow, logs from each service alone cannot tell you the full story. Distributed tracing stitches the whole journey back together.
Traces and spans
- A trace represents one request as it travels through the system.
- Each unit of work within it is a span, with a start time and duration.
- Spans nest, so you can see that the database call inside the order service took most of the time.
Context propagation
The magic is a trace id passed along in request headers from service to service. Every span records the same trace id, so a collector can reassemble them into one timeline even though they came from many machines.
A trace view immediately reveals which span dominated the latency, turning a mystery into a pinpointed bottleneck.
Key idea
Distributed tracing propagates a trace id across services so one request becomes a single timeline of spans you can analyze.