The Model Performance Monitoring

Watching live accuracy so a quietly decaying model is caught before users feel it.

Why models decay silently

A model that scored well in offline tests can degrade in production as the world shifts. Unlike a crashed server, a bad model still returns answers, so failures are silent. Monitoring turns that silence into signal.

What to track

Quality metrics like accuracy, precision, recall, or RMSE measured on real traffic.
Proxy metrics when labels are slow, such as click rate or downstream conversion.
Operational metrics like latency and error rate that affect the user experience.

The labeling delay problem

Ground truth often arrives late. A fraud label may take weeks. So monitoring blends fast proxies with slower confirmed metrics once true labels land.

Building the loop

Log every prediction with a stable request id so labels can be joined later.
Aggregate metrics over sliding windows to smooth noise.
Compare against a baseline captured at deployment time.

Good monitoring answers one question on demand: is the model still as good as the day we shipped it.

Key idea