← Lessons

quiz vs the machine

Platinum1760

System Design

Data Quality and Validation

Checking data against expectations so bad data is caught before it reaches consumers.

5 min read · advanced · beat Platinum to climb

Trust needs checks

Pipelines silently pass whatever they receive, so a broken upstream feed can poison every downstream report. Data quality checks assert expectations about data and fail loudly when reality breaks them.

Dimensions to check

  • Completeness means required fields are present and row counts match expectations.
  • Validity means values fit their type, range, and allowed set, like a status in a known list.
  • Uniqueness means keys are not duplicated.
  • Consistency means related values agree, like totals matching their line items.
  • Freshness means data arrived within its expected window.

Where checks run

Validation runs as a gate between stages. A common pattern blocks promotion of a batch that fails critical checks, quarantining it for review while letting the last good data keep serving. Warnings flag softer issues without halting the pipeline. Recording check results over time also reveals slow drifts before they become outages.

Key idea

Data quality gates assert completeness, validity, uniqueness, consistency, and freshness, blocking or quarantining bad batches so broken upstream data never silently reaches consumers.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a validity check verify?

2. What is a common response to a batch failing a critical quality check?

3. Why record check results over time?