Three ways to grow
You can make a network bigger by adding depth of layers, width of channels, or input resolution. Most prior work scaled one axis and hit diminishing returns.
The compound idea
EfficientNet observed that the three axes are linked. A higher resolution image needs more layers to grow the receptive field and more channels to capture finer patterns. So it scales all three together with a single compound coefficient.
- Depth, width, and resolution each get a fixed exponent.
- One user dial raises all three in balanced proportion.
Finding the balance
A small grid search on a baseline finds the best ratio between the three exponents under a fixed compute constraint. That ratio is then reused as the model is scaled up, avoiding a fresh search at every size.
Why it works
Balanced scaling avoids wasting capacity. Adding only depth leaves resolution starved, and adding only width leaves the network shallow. Coordinating them keeps each layer useful and yields better accuracy per FLOP.
Key idea
EfficientNet scales depth, width, and resolution together using one compound coefficient with fixed exponents found by a small search, giving better accuracy per FLOP than scaling any single axis.