What NER does
Named entity recognition scans text and marks spans that refer to real world things: people, organizations, locations, dates, and money amounts. The output is a label per token plus the boundary of each span.
The tagging scheme
Most systems frame NER as sequence labeling with a BIO scheme.
- B marks the beginning of an entity.
- I marks a token inside the same entity.
- O marks a token outside any entity.
So New York City becomes B-LOC I-LOC I-LOC, which lets a multi word name stay one unit.
How models learn it
- Encode each token with context, classically using features, now using a transformer.
- Predict a tag per token, often with a CRF layer on top to keep tag sequences valid.
- A CRF blocks illegal jumps such as an I tag that follows an O tag.
Where it is hard
- The same word can be a person or a place depending on context.
- New names never seen in training still need correct labels.
Key idea
NER is sequence labeling with a BIO scheme that turns per token tags into clean entity spans, often guarded by a CRF that forbids invalid tag transitions.