Why a pipeline
A model in a notebook is a one off. A pipeline is the automated, repeatable sequence that takes raw data and produces a deployed model. MLOps is the discipline of building and operating these pipelines reliably.
The standard stages
- Data ingestion pulls raw records from sources into a staging area.
- Validation checks schema, ranges, and missing values before anything downstream runs.
- Feature engineering transforms raw fields into model inputs.
- Training fits the model and writes a candidate artifact.
- Evaluation scores the candidate against held out data and a baseline.
- Deployment packages and ships the approved model to a serving system.
Why stages matter
Each stage has a clear input and output, so a failure is isolated and observable. You can rerun one stage, cache its output, and reason about correctness step by step rather than debugging a single monolithic script.
A mature pipeline runs on a schedule or trigger, logs every step, and gates deployment on evaluation passing.
Key idea
An ML pipeline breaks model delivery into isolated, observable stages from ingestion to deployment so the whole flow is automated and repeatable.