The Data Centric vs Model Centric

Two levers, one goal

You can raise performance by changing the model or by improving the data. Model centric work tunes architecture, loss, and hyperparameters on fixed data. Data centric work fixes labels, adds examples, and sharpens definitions on a fixed model.

Model centric: new layers, regularization, optimizer changes.
Data centric: relabeling, deduping, balancing, better collection.
Both are valid; the question is which pays more now.

When data centric wins

On many real systems the data is messier than the model is weak. Inconsistent labels and missing slices cap accuracy no matter how clever the architecture.

Noisy or inconsistent labels confuse any model.
Missing slices leave whole subgroups unlearned.
A small clean dataset often beats a large dirty one.

When model centric wins

If labels are clean and data is plentiful, the bottleneck is capacity or inductive bias, and model changes help most.

Diagnose the bottleneck before choosing a lever.

Key idea

Improving data and improving the model are complementary levers; error analysis tells you whether noisy data or limited model capacity is the binding constraint right now.

The Data Centric vs Model Centric

Two levers, one goal

When data centric wins

When model centric wins

Key idea

Check yourself