← Lessons

quiz vs the machine

Gold1360

Machine Learning

The Text Summarization Extractive

Selecting the most important sentences to form a faithful summary.

4 min read · core · beat Gold to climb

What extractive summarization does

Extractive summarization builds a summary by selecting whole sentences from the source and stitching them together. It never invents new wording, so it cannot hallucinate facts.

How sentences get ranked

  • Frequency methods score sentences by how many important words they contain.
  • Graph methods like TextRank treat sentences as nodes, connect similar ones, and rank by centrality.
  • Supervised methods train a classifier to predict whether each sentence belongs in the summary.

Selecting the final set

Picking the top scoring sentences alone tends to repeat the same point. So good systems add a redundancy penalty, choosing sentences that are important yet different from those already chosen.

Strengths and limits

  • Strength: every sentence is real source text, so faithfulness is high.
  • Limit: the summary can read choppy because sentences pulled from different places may not flow.
  • Limit: it cannot compress or rephrase, so it is verbose compared to abstractive output.

Key idea

Extractive summarization picks and orders important source sentences, staying faithful by construction but risking choppy, redundant output unless it penalizes overlap.

Check yourself

Answer to earn rating on the learn ladder.

1. Why is extractive summarization faithful by construction?

2. What does a redundancy penalty prevent?