← Lessons

quiz vs the machine

Gold1450

Machine Learning

The DeepFM

Combining factorization machines with a deep net over shared embeddings.

5 min read · core · beat Gold to climb

The motivation

Wide and deep needs hand crafted feature crosses for its wide side. DeepFM removes that manual work by using a factorization machine to learn low order feature interactions automatically, while a deep network learns high order ones, and both share the same embeddings.

The factorization machine side

A factorization machine models every pairwise feature interaction through the dot product of feature embeddings.

  • It captures second order crosses without you naming them.
  • It works even when a specific pair is rare, since embeddings are shared.

The deep side

The same embeddings feed a feed forward network that learns complex higher order interactions.

  • No separate embedding tables, which saves parameters.
  • Both sides see the same input representation.

Joint output

The FM score and the deep score are summed and passed through a sigmoid to predict click probability. Training uses binary cross entropy end to end.

  • Shared embeddings mean the FM and deep parts reinforce each other.
  • No feature engineering of crosses is required.

Key idea

DeepFM shares one embedding table between a factorization machine that learns pairwise crosses and a deep net that learns higher order ones, removing manual feature engineering entirely.

Check yourself

Answer to earn rating on the learn ladder.

1. What does DeepFM remove compared to wide and deep?

2. How do the FM and deep parts relate to embeddings?

3. What interaction order does the factorization machine side capture?