← Lessons

quiz vs the machine

Gold1450

Databases

Table and Index Bloat

Dead tuples and stale index entries inflate storage and slow scans long after the rows that caused them are gone.

5 min read · core · beat Gold to climb

Where Bloat Comes From

In an MVCC database an UPDATE writes a new row version and marks the old one dead rather than overwriting in place. A DELETE marks the row dead too. These dead tuples still occupy pages until vacuum reclaims the space. Indexes accumulate stale entries the same way.

Why It Hurts

  • Pages hold dead rows, so the same data spreads across more pages.
  • Scans read more pages to find the same live rows, raising IO.
  • The cache holds fewer useful rows per page, lowering hit rate.

Measuring It

Bloat is the gap between the space the live data needs and the space the relation actually occupies. Estimation queries and extensions compare live tuple sizes against on disk size to flag tables and indexes that are mostly empty space.

Fixing It

  • Routine autovacuum reclaims dead space for reuse, holding bloat steady.
  • A heavily bloated table may need a rewrite, such as VACUUM FULL or a tool that rebuilds without a long exclusive lock.
  • Rebuilding an index with REINDEX clears accumulated stale entries.

Key idea

Dead tuples from updates and deletes bloat tables and indexes, inflating pages and IO, so steady vacuuming and occasional rebuilds keep scans fast.

Check yourself

Answer to earn rating on the learn ladder.

1. What primarily causes table bloat in an MVCC database?

2. Why does bloat slow down scans?