Sequence Labeling With CRFs
A conditional random field, or CRF, is a model built for sequence labeling tasks like named entity recognition and part of speech tagging. Its strength is that it scores an entire label sequence at once rather than each token independently.
Why does that matter? Suppose a token by token classifier is unsure. It might emit an illegal sequence such as an I continuation label with no B beginning before it. A CRF prevents this by learning transition scores between adjacent labels, capturing rules like I location cannot follow O directly.
A linear chain CRF combines two ingredients:
- Emission scores, how well a token matches a label given its features
- Transition scores, how compatible neighboring labels are
It then finds the highest scoring full path of labels using the Viterbi algorithm, which efficiently searches all sequences with dynamic programming. Training maximizes the probability of the correct sequence over all alternatives.
A popular modern design stacks a CRF on top of a neural encoder. The network supplies rich, context aware emission scores, and the CRF layer enforces globally consistent label transitions. This combination long held strong results on tagging benchmarks.
The core lesson is that nearby labels are dependent, and modeling those dependencies beats deciding each token in isolation.
Key idea
A CRF scores whole label sequences using emission and transition scores, enforcing valid tag orders that per token classifiers cannot guarantee.