Feature Scaling at Serving
Many models expect features on a comparable scale. Scaling at serving time is a classic place where subtle bugs and leakage hide.
The right statistics
Normalization, such as subtracting a mean and dividing by a standard deviation, depends on statistics computed from data. The rule is that these statistics must come only from the training set:
- Fit the scaler on training data to learn the mean and deviation.
- Store those fixed numbers as part of the model artifact.
- Apply the same stored numbers to validation, test, and serving inputs.
Why not refit at serving
Recomputing statistics on serving data, or worse on the full dataset, leaks information and creates skew. A single live request has no meaningful mean of its own, so it must reuse the training statistics. Refitting per batch in production would make identical inputs scale differently depending on their neighbors, which is incorrect.
Packaging the scaler
Because of this, the scaler is treated as part of the model, versioned and shipped together. Many serving bugs trace back to a scaler that was retrained, lost, or mismatched to the model it accompanies.
Key idea
Scaling statistics are learned once on training data, shipped with the model, and reused unchanged at serving to avoid leakage and skew.