Trust needs checks
Pipelines silently pass whatever they receive, so a broken upstream feed can poison every downstream report. Data quality checks assert expectations about data and fail loudly when reality breaks them.
Dimensions to check
- Completeness means required fields are present and row counts match expectations.
- Validity means values fit their type, range, and allowed set, like a status in a known list.
- Uniqueness means keys are not duplicated.
- Consistency means related values agree, like totals matching their line items.
- Freshness means data arrived within its expected window.
Where checks run
Validation runs as a gate between stages. A common pattern blocks promotion of a batch that fails critical checks, quarantining it for review while letting the last good data keep serving. Warnings flag softer issues without halting the pipeline. Recording check results over time also reveals slow drifts before they become outages.
Key idea
Data quality gates assert completeness, validity, uniqueness, consistency, and freshness, blocking or quarantining bad batches so broken upstream data never silently reaches consumers.