The Instruction Tuning

From predictor to assistant

A base language model only predicts the next token. It can complete text but will not reliably follow an instruction like summarize this or answer this question. Instruction tuning fine tunes the model on many examples of instructions paired with good responses.

The dataset shape

Each example contains an instruction, optional input, and a target response.

Examples cover diverse tasks such as translation, summarization, and reasoning.
The model learns the general skill of mapping a request to a helpful answer.
Variety matters more than any single task, so the behavior generalizes.

The training flow

Why it generalizes

Because the data spans many tasks phrased as instructions, the model learns a format and intent rather than memorizing answers. It then follows instructions for tasks it never saw explicitly, a key step toward a usable assistant.

Key idea

Instruction tuning fine tunes a base model on diverse instruction and response pairs so it learns to follow natural language requests and generalize to new tasks.

The Instruction Tuning

From predictor to assistant

The dataset shape

The training flow

Why it generalizes

Key idea

Check yourself