← Lessons

quiz vs the machine

Gold1450

Machine Learning

Model Serving Infrastructure

How a trained model becomes a reliable, scalable prediction service.

6 min read · core · beat Gold to climb

From artifact to service

A trained model is a file. Serving wraps it in an API, scales it, and keeps it healthy.

Core components

  • Model server loads the artifact and exposes a predict endpoint
  • Load balancer spreads requests across replicas
  • Autoscaler adds replicas under load and removes them when idle
  • Model registry versions artifacts and tracks which is live

Deployment patterns

  • Embedded model runs inside the application process, lowest latency
  • Service a separate prediction microservice, easier to scale and update
  • Edge model runs on device, no network round trip

Safe rollouts

  • Canary send a small slice of traffic to the new model first
  • Shadow run the new model alongside the old without serving its output
  • Rollback keep the previous version one click away

Decoupling the model service from the app lets you update the model without redeploying everything.

Key idea

Serving turns a file into a versioned, autoscaling service with safe rollouts and instant rollback.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a shadow deployment do?

2. What is the benefit of a separate prediction microservice over embedding the model?