A likelihood based merge
WordPiece powers BERT and many encoder models. Like BPE it starts from characters and merges, but it does not merge the most frequent pair. Instead it merges the pair that most increases the likelihood of the training corpus under a unigram language model.
The selection score
Concretely it favors the pair whose merge gives the highest score, roughly the frequency of the pair divided by the product of the frequencies of its parts. This prefers pairs that occur together more than chance would predict.
The continuation marker
WordPiece marks subword pieces that continue a word with a prefix such as a double hash. So a split of playing might become play and a continuation piece ing, letting detokenization rejoin them cleanly.
Tokenizing
At inference WordPiece uses greedy longest match from the front of each word, peeling off the longest piece in the vocabulary and marking the rest as continuations.
Key idea
WordPiece merges by likelihood gain rather than raw frequency and uses continuation markers so pieces can be rejoined into words.