← Lessons

quiz vs the machine

Gold1420

Machine Learning

The Wide And Deep Model

Joining a memorizing linear model with a generalizing deep network.

4 min read · core · beat Gold to climb

Two complementary needs

A good recommender must both memorize known good combinations and generalize to unseen ones. The wide and deep model trains both jointly so each side covers the other.

The wide part

The wide side is a linear model over raw and crossed features. A cross feature like installed app paired with impression app lets the model memorize specific co occurrences seen in logs.

  • Great at remembering exact rules.
  • Cannot generalize to feature pairs it never saw.

The deep part

The deep side embeds sparse features into dense vectors and passes them through a feed forward network.

  • Learns smooth generalizations across similar items.
  • Can over generalize and recommend odd items when data is sparse.

Joint training

Both parts feed a shared output, and their losses are combined into one gradient step. The wide side uses a sparse optimizer while the deep side uses a standard one.

  • The wide model patches the deep model when it over generalizes.
  • The deep model fills gaps the wide model cannot reach.

Key idea

Wide and deep jointly trains a linear cross feature memorizer with a deep generalizer so the system both recalls exact patterns and extends to unseen combinations.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the wide part good at?

2. How are the wide and deep parts trained?