← Lessons

quiz vs the machine

Gold1420

Machine Learning

The EfficientNet Scaling

Scaling depth, width, and resolution together with one compound rule.

5 min read · core · beat Gold to climb

Three ways to grow

You can make a network bigger by adding depth of layers, width of channels, or input resolution. Most prior work scaled one axis and hit diminishing returns.

The compound idea

EfficientNet observed that the three axes are linked. A higher resolution image needs more layers to grow the receptive field and more channels to capture finer patterns. So it scales all three together with a single compound coefficient.

  • Depth, width, and resolution each get a fixed exponent.
  • One user dial raises all three in balanced proportion.

Finding the balance

A small grid search on a baseline finds the best ratio between the three exponents under a fixed compute constraint. That ratio is then reused as the model is scaled up, avoiding a fresh search at every size.

Why it works

Balanced scaling avoids wasting capacity. Adding only depth leaves resolution starved, and adding only width leaves the network shallow. Coordinating them keeps each layer useful and yields better accuracy per FLOP.

Key idea

EfficientNet scales depth, width, and resolution together using one compound coefficient with fixed exponents found by a small search, giving better accuracy per FLOP than scaling any single axis.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the core insight behind compound scaling?

2. How is the ratio between the three exponents chosen?