Where Bloat Comes From
In an MVCC database an UPDATE writes a new row version and marks the old one dead rather than overwriting in place. A DELETE marks the row dead too. These dead tuples still occupy pages until vacuum reclaims the space. Indexes accumulate stale entries the same way.
Why It Hurts
- Pages hold dead rows, so the same data spreads across more pages.
- Scans read more pages to find the same live rows, raising IO.
- The cache holds fewer useful rows per page, lowering hit rate.
Measuring It
Bloat is the gap between the space the live data needs and the space the relation actually occupies. Estimation queries and extensions compare live tuple sizes against on disk size to flag tables and indexes that are mostly empty space.
Fixing It
- Routine autovacuum reclaims dead space for reuse, holding bloat steady.
- A heavily bloated table may need a rewrite, such as VACUUM FULL or a tool that rebuilds without a long exclusive lock.
- Rebuilding an index with REINDEX clears accumulated stale entries.
Key idea
Dead tuples from updates and deletes bloat tables and indexes, inflating pages and IO, so steady vacuuming and occasional rebuilds keep scans fast.