The motivation
Wide and deep needs hand crafted feature crosses for its wide side. DeepFM removes that manual work by using a factorization machine to learn low order feature interactions automatically, while a deep network learns high order ones, and both share the same embeddings.
The factorization machine side
A factorization machine models every pairwise feature interaction through the dot product of feature embeddings.
- It captures second order crosses without you naming them.
- It works even when a specific pair is rare, since embeddings are shared.
The deep side
The same embeddings feed a feed forward network that learns complex higher order interactions.
- No separate embedding tables, which saves parameters.
- Both sides see the same input representation.
Joint output
The FM score and the deep score are summed and passed through a sigmoid to predict click probability. Training uses binary cross entropy end to end.
- Shared embeddings mean the FM and deep parts reinforce each other.
- No feature engineering of crosses is required.
Key idea
DeepFM shares one embedding table between a factorization machine that learns pairwise crosses and a deep net that learns higher order ones, removing manual feature engineering entirely.