The Monitoring and Alerting Mention

You cannot fix what you cannot see

A complete design includes how you would observe the system in production. Mentioning monitoring and alerting signals operational maturity, that you think about running the system, not just building it.

The pillars of observability

Metrics are numbers over time like latency and error rate.
Logs are detailed records of individual events.
Traces follow one request across many services.

What to watch

The golden signals of latency, traffic, errors, and saturation.
Business metrics like sign ups or orders per minute.
Dependency health for databases and downstream services.

Alert with care

Alerts should fire on symptoms users feel, like rising error rate or latency, not on every internal blip. Too many alerts cause fatigue and missed pages. State that you would alert on the golden signals and tie thresholds to your latency and availability goals.

Key idea

Round out a design with metrics, logs, and traces, and alert on user facing symptoms tied to your latency and availability goals.

The Monitoring and Alerting Mention

You cannot fix what you cannot see

The pillars of observability

What to watch

Alert with care

Key idea

Check yourself