Why a rubric
A bare instruction to rate quality invites every grader to imagine a different standard. A rubric breaks quality into named criteria, each with a defined scale, so judgments become explicit and repeatable.
Anatomy of a rubric
- Criteria such as correctness, completeness, clarity, and safety.
- A scale per criterion, for example a one to five anchor with descriptions.
- Weights that reflect how much each criterion matters.
- Examples that anchor each level so graders calibrate the same way.
The final score is the weighted combination of criterion scores.
Benefits
Rubrics make grading auditable: you can see why an answer lost points. They support partial credit instead of pass or fail, and they work for humans and LLM judges alike. Because criteria are explicit, disagreements become specific and fixable.
Failure modes
A rubric is only as good as its anchors. Vague levels collapse into personal taste. Too many criteria overwhelm graders and lower agreement. Weights baked in carelessly can let a trivial criterion dominate. Pilot the rubric, measure agreement, and revise the anchors before scaling up.
Key idea
Rubric based scoring decomposes quality into weighted, anchored criteria, producing auditable partial credit grades, but only well piloted anchors keep graders consistent.