← Lessons

quiz vs the machine

Platinum1800

Databases

Direct IO vs Page Cache

A database can rely on the operating system page cache or bypass it with direct input output, each shaping caching control and durability.

5 min read · advanced · beat Platinum to climb

Two Layers of Caching

When an engine reads a file normally, the operating system also caches that data in its page cache. So a page can sit in both the database buffer pool and the operating system cache, a redundancy called double caching that wastes memory.

Direct Input Output

Direct input output tells the operating system to skip its page cache and move data straight between disk and the engine buffers.

  • The engine controls exactly what is cached, avoiding double caching.
  • Memory the operating system would have used for caching is freed for the buffer pool.
  • The engine takes full responsibility for caching decisions, which is more work to get right.

Durability Differences

With buffered input output, a write reaches the operating system cache and is durable only after an fsync flushes it to the device. Direct input output still generally needs a sync or a special flag to guarantee the data has reached stable storage, since drives have their own volatile caches.

When Each Wins

  • A database with a large, well tuned buffer pool often prefers direct input output to avoid double caching and gain control.
  • A simpler system, or one with a small buffer pool, may let the operating system cache do the work.

Key idea

Buffered input output leans on the operating system page cache and risks double caching, while direct input output bypasses it for tighter control at the cost of managing caching and durability yourself.

Check yourself

Answer to earn rating on the learn ladder.

1. What problem does direct input output avoid?

2. What does an engine take on when using direct input output?

3. Why might direct input output still need an explicit sync?