From predictor to assistant
A base language model only predicts the next token. It can complete text but will not reliably follow an instruction like summarize this or answer this question. Instruction tuning fine tunes the model on many examples of instructions paired with good responses.
The dataset shape
Each example contains an instruction, optional input, and a target response.
- Examples cover diverse tasks such as translation, summarization, and reasoning.
- The model learns the general skill of mapping a request to a helpful answer.
- Variety matters more than any single task, so the behavior generalizes.
The training flow
Why it generalizes
Because the data spans many tasks phrased as instructions, the model learns a format and intent rather than memorizing answers. It then follows instructions for tasks it never saw explicitly, a key step toward a usable assistant.
Key idea
Instruction tuning fine tunes a base model on diverse instruction and response pairs so it learns to follow natural language requests and generalize to new tasks.