What GPT is
GPT is a decoder only transformer trained to predict the next token in a sequence, making it a natural text generator.
- It uses causal or masked self attention, so each position attends only to earlier positions.
- This left to right constraint enforces a valid autoregressive generation order.
Pretraining
The objective is simple next token prediction: maximize the probability of the true next token given everything before it. Trained at scale on huge corpora, this yields broad language ability.
Generation
At inference the model samples one token, appends it, and repeats. Sampling controls such as temperature trade off diversity against coherence.
Unlike BERT, GPT can both understand and generate, which underpins modern chat assistants after further instruction tuning.
Key idea
GPT is a causal decoder trained on next token prediction, so it generates text autoregressively by feeding each predicted token back as new context.