← Lessons

quiz vs the machine

Platinum1780

System Design

Change Data Capture Pipelines

Streaming inserts, updates, and deletes out of a database by reading its transaction log.

5 min read · advanced · beat Platinum to climb

Moving changes, not snapshots

Copying a whole table every night is slow and misses intermediate changes. Change data capture, or CDC, streams each row change out of a source database as it happens, so downstream systems stay nearly in sync.

Log based capture

The robust approach reads the database transaction log, such as the write ahead log or binlog, which already records every committed change in order. A connector tails this log and emits an event per change with its operation type of insert, update, or delete.

  • It avoids polling and adds little load to the source.
  • It captures every change including deletes, which query based polling can miss.
  • It preserves commit order, which matters for correctness.

Applying changes downstream

Consumers must apply changes idempotently and in order, often upserting by primary key and tombstoning deletes. Because the same change may be delivered more than once, downstream merge logic must be safe to repeat.

Key idea

Log based change data capture tails the transaction log to stream ordered inserts, updates, and deletes with low source load, requiring idempotent ordered application downstream.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is log based CDC preferred over polling for changes?

2. Why must downstream CDC consumers be idempotent?

3. What does a log based connector capture that simple polling can miss?