One chain can be unlucky
A single chain of thought may take a wrong turn. Self consistency samples several independent reasoning paths for the same question and then picks the answer most of them agree on. It trades extra compute for robustness.
The recipe
- Use a chain of thought prompt and a nonzero temperature so paths differ.
- Sample several completions, each with its own reasoning.
- Extract the final answer from each path.
- Take the majority vote over those answers.
Why voting helps
Correct reasoning tends to converge on the same answer through different routes, while errors scatter. The mode of many samples is therefore more often right than any single greedy decode, especially on problems with one clean final value.
When it fits
It shines on tasks with a short discrete answer that is easy to compare, such as a number or a label. For open ended writing there is no clean vote, so the method does not apply as cleanly and costs add up fast.
Key idea
Self consistency samples several reasoning paths and takes the majority answer, since correct chains converge while errors scatter, buying reliability on discrete answer tasks at the cost of extra compute.