The problem
Logs are written on hundreds of ephemeral hosts and containers. When you need to investigate, you cannot ssh into each box. A log aggregation pipeline collects, transports, and indexes logs into a central, searchable store.
The stages
- Collection uses an agent on each host or sidecar that tails files or reads stdout. It buffers locally so a brief outage does not lose data.
- Transport ships records over the network, often through a buffer like a message queue that absorbs spikes and decouples producers from the store.
- Processing parses, enriches, and redacts records, adding fields such as service name and dropping noisy lines.
- Storage and indexing writes records into a search engine so queries by field and time range stay fast.
- Query and visualization lets engineers search and build dashboards.
Design pressures
- Volume can be enormous, so sampling or dropping debug logs controls cost.
- Backpressure matters because the store can fall behind, and the buffer prevents data loss.
- Retention is tiered, keeping recent logs hot and archiving older ones cheaply.
Key idea
A log pipeline collects, buffers, processes, and indexes logs from many hosts into one searchable store while controlling volume and backpressure.