The core model
Kafka is a distributed append only log. Producers append records to topics, and consumers read them. A topic is split into partitions, each an ordered immutable sequence of records.
- Offset: every record in a partition has a monotonically increasing offset, its position in the log.
- Retention: records stay for a configured time or size, so consumers can reread history.
- Brokers: servers that host partitions, with each partition replicated across brokers for durability.
Why partitions matter
- Parallelism: more partitions let more consumers read in parallel.
- Ordering scope: order is guaranteed only within a partition, not across the whole topic.
- Key routing: records with the same key go to the same partition, keeping per key order.
Durability
- Each partition has a leader and followers. Writes go to the leader and replicate to followers.
- A write is acknowledged once enough in sync replicas have it, controlled by the acks setting.
Consumers track their own position
Unlike a queue that deletes on read, Kafka keeps records and each consumer group stores its own offset, so different readers progress independently and can replay.
Key idea
Kafka is a partitioned replicated append only log where ordering and parallelism live at the partition level and consumers track offsets, enabling both pub sub and replay.