The task
Machine translation converts text from a source language to a target language while preserving meaning. Modern systems are neural and trained end to end on parallel sentence pairs.
Encoder decoder with attention
- The encoder reads the source sentence into contextual vectors.
- The decoder generates the target one token at a time.
- Attention lets each target word focus on the relevant source words, which solved the bottleneck of squeezing a long sentence into one vector.
Transformers made this fully attention based and highly parallel.
Decoding the output
- Greedy decoding takes the most likely next token, which can go astray.
- Beam search keeps several partial translations and expands the best, trading compute for quality.
Evaluation and hard cases
- BLEU scores n gram overlap with human references and is the long standing benchmark metric.
- Low resource pairs lack data, so translation quality drops sharply.
- Idioms and word order differences between languages remain hard.
Key idea
Neural machine translation uses an attention based encoder decoder trained on parallel data, decodes with beam search, and is scored with BLEU, while low resource pairs and idioms stay difficult.