← Lessons

quiz vs the machine

Gold1350

Machine Learning

The Depthwise Separable Convolution

Splitting a convolution into spatial and channel steps to cut compute.

5 min read · core · beat Gold to climb

The cost of standard convolution

A standard convolution mixes spatial position and channels at once, so its cost scales with kernel area times input channels times output channels. On mobile budgets this is too expensive.

Two cheaper steps

Depthwise separable convolution factors the operation:

  • A depthwise step applies one spatial filter per input channel, mixing space but not channels.
  • A pointwise step uses one by one convolutions to mix channels but not space.

Together they approximate the full convolution at far lower cost.

The savings

The cost ratio is roughly one over the output channels plus one over the kernel area. For a three by three kernel with many channels this is about an eight to nine times reduction in multiplies. That is why MobileNet and similar designs lean on it.

The trade

You lose some expressive power because spatial and channel mixing no longer happen jointly. In practice the accuracy drop is small and the efficiency gain is large, which is a favorable bargain on constrained hardware.

Key idea

Depthwise separable convolution splits filtering into a per channel spatial step and a one by one channel mixing step, cutting compute by roughly eight to nine times with little accuracy loss.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the depthwise step mix?

2. Why is this factorization popular on mobile?