← Lessons

quiz vs the machine

Gold1380

System Design

Log Aggregation Pipelines

Collecting logs from many hosts into one searchable store.

5 min read · core · beat Gold to climb

Log aggregation pipelines

When you run many machines, the logs you need are scattered across all of them. A log aggregation pipeline collects those logs into a central, searchable place so you can investigate without logging into each host.

The pipeline stages

  • A collector or agent runs on each host, tailing log files and forwarding lines
  • A buffer such as a message queue absorbs bursts so a spike does not overwhelm the store
  • A processor parses, enriches, and normalizes fields
  • A store and index make logs searchable, often with retention tiers

The buffer is the part people forget. Without it a traffic surge that triples your log volume can knock over the indexing layer exactly when you need visibility most.

Trade offs

Indexing every field is fast to search but expensive. Many teams index a few key fields and keep the full text in cheaper storage. Retention is a cost lever too. Hot recent logs stay searchable while older logs move to cold archives or expire.

Key idea

A log pipeline collects, buffers, processes, and indexes logs centrally, and the buffer protects the store during traffic spikes.

Check yourself

Answer to earn rating on the learn ladder.

1. Why include a buffer between collectors and the indexing store?

2. Why do many teams index only a few fields rather than all of them?