Tuning by search, not by feel
Automated prompt optimization treats the prompt as something to search over. You define a metric on a labeled set, propose prompt variants, score them, and keep what wins, replacing slow manual trial and error.
How the loop runs
- Define a metric such as accuracy or a graded score on a held out set.
- Propose variants by editing wording, swapping examples, or asking a model to mutate the current best.
- Evaluate each candidate on the set and record its score.
- Select and iterate, keeping top candidates and proposing new ones from them.
Families of methods
Some methods do gradient free search over discrete edits, some have a model critique failures and rewrite, and some learn soft prompts as continuous vectors when you can train. All share the loop of propose, score, select.
Guard against overfitting
Optimizing hard on one small set can produce a prompt that wins there but fails in the wild. Hold out a separate test set, watch for memorized quirks, and confirm gains on fresh data before you trust the chosen prompt.
Key idea
Automated prompt optimization searches prompt space against a metric through propose, score, and select loops, replacing manual tuning, while a held out test set guards against overfitting the small evaluation set.