Trading CPU for Input Output
Storage engines can store pages compressed on disk. A page is compressed before it is written and decompressed after it is read. This shrinks the file and cuts how many bytes the disk must move, paying with extra CPU work.
Block Versus Page
Engines compress a unit of data, sometimes a single page and sometimes a larger block of several pages.
- A larger compression block usually packs better because the algorithm sees more redundancy.
- But reading one row may force decompressing a whole large block, wasting work.
- A smaller block decompresses less per read but compresses less tightly.
Why It Helps LSM and B Trees
In an LSM tree, immutable SSTables are a natural fit because they are written once and read many times, so paying to compress once is cheap. In a B tree, compression must cope with in place updates, which is harder because a page may grow when rewritten.
Tuning
Choosing a faster algorithm favors CPU bound systems, while a stronger algorithm favors storage bound ones. The right choice depends on whether the bottleneck is the processor or the disk.
Key idea
Page compression trades CPU for smaller files and less disk traffic, with block size and algorithm choice balancing read overhead against compression ratio.