What a Heap File Is
A heap file is the simplest way to lay out a table on disk: rows are stored in no particular order, placed wherever free space exists. A new row goes into any page with room, often the most recently used one.
How Rows Are Found
Because the order is arbitrary, a row is located by its physical address, often a page number plus a slot, called a row identifier or tuple id. Secondary indexes point at these identifiers.
- A lookup by indexed column reads the index, gets the row identifier, then fetches the page.
- A query with no useful index must perform a full table scan, reading every page.
Heap Versus Clustered
The main alternative is a clustered organization, where rows are physically sorted by the primary key, as in a B tree index organized table.
- Heaps make inserts cheap, since a row drops into any free spot.
- Clustered tables make range scans on the key cheap, since adjacent keys sit on adjacent pages.
- Heaps can leave gaps after deletes, needing periodic vacuum or reorganization.
Where Heaps Fit
Heaps suit insert heavy tables and staging areas where access is mostly through secondary indexes rather than ordered scans of the primary key.
Key idea
A heap file stores rows unordered for cheap inserts, relying on indexes or full scans to find rows, unlike a clustered table sorted by its key.