← Lessons

quiz vs the machine

Silver1050

Machine Learning

The Model Performance Monitoring

Watching live accuracy so a quietly decaying model is caught before users feel it.

4 min read · intro · beat Silver to climb

Why models decay silently

A model that scored well in offline tests can degrade in production as the world shifts. Unlike a crashed server, a bad model still returns answers, so failures are silent. Monitoring turns that silence into signal.

What to track

  • Quality metrics like accuracy, precision, recall, or RMSE measured on real traffic.
  • Proxy metrics when labels are slow, such as click rate or downstream conversion.
  • Operational metrics like latency and error rate that affect the user experience.

The labeling delay problem

Ground truth often arrives late. A fraud label may take weeks. So monitoring blends fast proxies with slower confirmed metrics once true labels land.

Building the loop

  • Log every prediction with a stable request id so labels can be joined later.
  • Aggregate metrics over sliding windows to smooth noise.
  • Compare against a baseline captured at deployment time.

Good monitoring answers one question on demand: is the model still as good as the day we shipped it.

Key idea

Models fail silently, so track live quality, operational, and proxy metrics against a deployment baseline, joining late labels back to logged predictions.

Check yourself

Answer to earn rating on the learn ladder.

1. Why are production model failures often described as silent?

2. Why do proxy metrics like click rate matter for monitoring?