Feature Scaling at Serving

Many models expect features on a comparable scale. Scaling at serving time is a classic place where subtle bugs and leakage hide.

The right statistics

Normalization, such as subtracting a mean and dividing by a standard deviation, depends on statistics computed from data. The rule is that these statistics must come only from the training set:

Fit the scaler on training data to learn the mean and deviation.
Store those fixed numbers as part of the model artifact.
Apply the same stored numbers to validation, test, and serving inputs.

Why not refit at serving

Recomputing statistics on serving data, or worse on the full dataset, leaks information and creates skew. A single live request has no meaningful mean of its own, so it must reuse the training statistics. Refitting per batch in production would make identical inputs scale differently depending on their neighbors, which is incorrect.

Packaging the scaler

Because of this, the scaler is treated as part of the model, versioned and shipped together. Many serving bugs trace back to a scaler that was retrained, lost, or mismatched to the model it accompanies.

Key idea

Scaling statistics are learned once on training data, shipped with the model, and reused unchanged at serving to avoid leakage and skew.

Feature Scaling at Serving