Requirements
- Collect logs from thousands of hosts and services.
- Make logs searchable with low query latency.
- Buffer bursts so spikes do not lose data.
High level design
Agents ship logs through a buffer into a search index and archive.
- Collection: a lightweight agent on each host tails files and forwards lines.
- Buffer: a partitioned log such as Kafka decouples producers from consumers and absorbs bursts.
- Processing: consumers parse, enrich, and route logs to an index for search and to cheap object storage for long term retention.
Bottlenecks
- Burst volume: an incident causes a log storm, so the buffer absorbs spikes and applies backpressure.
- Index cost: full text indexing is expensive, so index recent hot data and archive older logs cheaply.
- Cardinality: high cardinality fields balloon the index, so sample or drop noisy fields.
Tradeoffs
- Indexing everything enables rich search but costs heavily in storage and compute.
- Tiering hot to cold cuts cost but makes old log queries slower.
Key idea
A log pipeline ships logs through a buffering log into a hot search index and a cold archive, tiering data so recent logs are fast and old logs are cheap.