← Lessons

quiz vs the machine

Silver1100

Machine Learning

Gini Impurity and Entropy

Two ways to measure how mixed the labels are at a node.

4 min read · intro · beat Silver to climb

Gini Impurity and Entropy

To pick splits a tree needs a number for how mixed a node is. Gini impurity and entropy are the two standard impurity measures, both lowest when a node holds a single class.

Gini impurity

Gini equals one minus the sum of squared class proportions. If a node is all one class the squared proportion is one, so Gini is zero. A perfectly even two class split gives Gini of one half, the maximum for two classes.

Entropy

Entropy sums the proportion of each class times the negative log of that proportion. It is also zero for a pure node and peaks when classes are evenly mixed. The reduction in entropy from a split is called information gain.

How they compare

  • Both reward purity and punish mixed nodes.
  • Gini is slightly cheaper because it avoids logarithms.
  • They usually pick the same or very similar splits, so the choice rarely changes the final tree much.

Using the measure

For each candidate split the tree computes the weighted average impurity of the children and subtracts it from the parent impurity. The split with the biggest drop wins.

Key idea

Gini and entropy both score label mixing, are zero for pure nodes, and usually agree on the best split.

Check yourself

Answer to earn rating on the learn ladder.

1. What is the Gini impurity of a node that holds only one class?

2. What is the reduction in entropy from a split called?

3. Why might Gini be preferred over entropy in practice?