← Lessons

quiz vs the machine

Platinum1720

Machine Learning

Sequence Labeling With CRFs

Tagging sequences while respecting label transitions.

5 min read · advanced · beat Platinum to climb

Sequence Labeling With CRFs

A conditional random field, or CRF, is a model built for sequence labeling tasks like named entity recognition and part of speech tagging. Its strength is that it scores an entire label sequence at once rather than each token independently.

Why does that matter? Suppose a token by token classifier is unsure. It might emit an illegal sequence such as an I continuation label with no B beginning before it. A CRF prevents this by learning transition scores between adjacent labels, capturing rules like I location cannot follow O directly.

A linear chain CRF combines two ingredients:

  • Emission scores, how well a token matches a label given its features
  • Transition scores, how compatible neighboring labels are

It then finds the highest scoring full path of labels using the Viterbi algorithm, which efficiently searches all sequences with dynamic programming. Training maximizes the probability of the correct sequence over all alternatives.

A popular modern design stacks a CRF on top of a neural encoder. The network supplies rich, context aware emission scores, and the CRF layer enforces globally consistent label transitions. This combination long held strong results on tagging benchmarks.

The core lesson is that nearby labels are dependent, and modeling those dependencies beats deciding each token in isolation.

Key idea

A CRF scores whole label sequences using emission and transition scores, enforcing valid tag orders that per token classifiers cannot guarantee.

Check yourself

Answer to earn rating on the learn ladder.

1. What does a CRF add over independent token classification?

2. Which algorithm finds the best label path in a linear chain CRF?

3. Why pair a neural encoder with a CRF layer?