← Lessons

quiz vs the machine

Platinum1750

System Design

Disaster Recovery RPO and RTO

The two numbers that define how much data and time a disaster may cost.

6 min read · advanced · beat Platinum to climb

Planning for the worst day

Disaster recovery is the plan for restoring service after a major loss, such as a region going down or data being corrupted. Two numbers anchor the plan and drive its cost.

The two targets

  • Recovery point objective is how much data you can afford to lose, measured in time. An RPO of five minutes means backups or replication must be no older than five minutes, so at most five minutes of data is lost.
  • Recovery time objective is how long you can be down. An RTO of one hour means service must be restored within an hour of the disaster.

A tighter RPO needs more frequent or continuous replication. A tighter RTO needs warmer standby capacity ready to take over.

Matching strategy to targets

  • Backup and restore is cheap but slow, fitting loose RTO.
  • Warm standby keeps a scaled down copy running for a moderate RTO.
  • Hot standby runs a full second site for near zero RTO at high cost.

Test the recovery regularly, because an untested plan usually fails when it is finally needed.

Key idea

RPO is the tolerable data loss and RTO is the tolerable downtime, and together they decide how much standby and replication you must pay for.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the recovery point objective measure?

2. What does a tighter recovery time objective require?

3. Why test the disaster recovery plan regularly?