← Lessons

quiz vs the machine

Gold1380

Machine Learning

The Data Centric vs Model Centric

Decide whether to improve the data or the model for the next gain.

5 min read · core · beat Gold to climb

Two levers, one goal

You can raise performance by changing the model or by improving the data. Model centric work tunes architecture, loss, and hyperparameters on fixed data. Data centric work fixes labels, adds examples, and sharpens definitions on a fixed model.

  • Model centric: new layers, regularization, optimizer changes.
  • Data centric: relabeling, deduping, balancing, better collection.
  • Both are valid; the question is which pays more now.

When data centric wins

On many real systems the data is messier than the model is weak. Inconsistent labels and missing slices cap accuracy no matter how clever the architecture.

  • Noisy or inconsistent labels confuse any model.
  • Missing slices leave whole subgroups unlearned.
  • A small clean dataset often beats a large dirty one.

When model centric wins

If labels are clean and data is plentiful, the bottleneck is capacity or inductive bias, and model changes help most.

Diagnose the bottleneck before choosing a lever.

Key idea

Improving data and improving the model are complementary levers; error analysis tells you whether noisy data or limited model capacity is the binding constraint right now.

Check yourself

Answer to earn rating on the learn ladder.

1. What distinguishes data centric work from model centric work?

2. When is a data centric approach most likely to help?

3. How do you decide which lever to pull?