Two Layers of Caching
When an engine reads a file normally, the operating system also caches that data in its page cache. So a page can sit in both the database buffer pool and the operating system cache, a redundancy called double caching that wastes memory.
Direct Input Output
Direct input output tells the operating system to skip its page cache and move data straight between disk and the engine buffers.
- The engine controls exactly what is cached, avoiding double caching.
- Memory the operating system would have used for caching is freed for the buffer pool.
- The engine takes full responsibility for caching decisions, which is more work to get right.
Durability Differences
With buffered input output, a write reaches the operating system cache and is durable only after an fsync flushes it to the device. Direct input output still generally needs a sync or a special flag to guarantee the data has reached stable storage, since drives have their own volatile caches.
When Each Wins
- A database with a large, well tuned buffer pool often prefers direct input output to avoid double caching and gain control.
- A simpler system, or one with a small buffer pool, may let the operating system cache do the work.
Key idea
Buffered input output leans on the operating system page cache and risks double caching, while direct input output bypasses it for tighter control at the cost of managing caching and durability yourself.