What extractive summarization does
Extractive summarization builds a summary by selecting whole sentences from the source and stitching them together. It never invents new wording, so it cannot hallucinate facts.
How sentences get ranked
- Frequency methods score sentences by how many important words they contain.
- Graph methods like TextRank treat sentences as nodes, connect similar ones, and rank by centrality.
- Supervised methods train a classifier to predict whether each sentence belongs in the summary.
Selecting the final set
Picking the top scoring sentences alone tends to repeat the same point. So good systems add a redundancy penalty, choosing sentences that are important yet different from those already chosen.
Strengths and limits
- Strength: every sentence is real source text, so faithfulness is high.
- Limit: the summary can read choppy because sentences pulled from different places may not flow.
- Limit: it cannot compress or rephrase, so it is verbose compared to abstractive output.
Key idea
Extractive summarization picks and orders important source sentences, staying faithful by construction but risking choppy, redundant output unless it penalizes overlap.