← Lessons

quiz vs the machine

Gold1410

System Design

The Monitoring and Alerting Mention

Showing you would know when the system is unhealthy in production.

4 min read · core · beat Gold to climb

You cannot fix what you cannot see

A complete design includes how you would observe the system in production. Mentioning monitoring and alerting signals operational maturity, that you think about running the system, not just building it.

The pillars of observability

  • Metrics are numbers over time like latency and error rate.
  • Logs are detailed records of individual events.
  • Traces follow one request across many services.

What to watch

  • The golden signals of latency, traffic, errors, and saturation.
  • Business metrics like sign ups or orders per minute.
  • Dependency health for databases and downstream services.

Alert with care

Alerts should fire on symptoms users feel, like rising error rate or latency, not on every internal blip. Too many alerts cause fatigue and missed pages. State that you would alert on the golden signals and tie thresholds to your latency and availability goals.

Key idea

Round out a design with metrics, logs, and traces, and alert on user facing symptoms tied to your latency and availability goals.

Check yourself

Answer to earn rating on the learn ladder.

1. What are the golden signals to monitor?

2. Why alert on symptoms rather than every internal blip?