← Lessons

quiz vs the machine

Gold1460

System Design

The Failure Scenarios Discussion

Walking through what breaks when a component dies and how the system survives.

5 min read · core · beat Gold to climb

Assume things fail

A mature design assumes every component eventually fails. Interviewers ask what happens when a server, a database, or a whole zone goes down. Walking through these scenarios shows you design for reality, not the happy path.

Common failures

  • A node crashes and its load must move elsewhere.
  • A database primary dies and a replica must take over.
  • A network partition splits the system in two.
  • A dependency slows down and threatens to drag you with it.

Surviving each

  • Redundancy so no single node is critical.
  • Failover to promote a healthy replica.
  • Timeouts and retries so a slow dependency is bounded.
  • Circuit breakers to stop cascading failures.
  • Graceful degradation to serve a reduced experience.

No single point of failure

The recurring goal is removing single points of failure. For each component, ask what happens if it dies and whether the system keeps serving. If the answer is total outage, add redundancy or a fallback.

Key idea

Assume every component fails, then design detection, failover, and graceful degradation so no single death takes the whole system down.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the recurring goal when discussing failures?

2. What does a circuit breaker protect against?