← Lessons

quiz vs the machine

Platinum1860

System Design

Cost Optimization for Analytics

Cutting compute and storage spend in cloud data platforms without losing speed.

6 min read · advanced · beat Platinum to climb

Why costs explode

Cloud analytics often bills by data scanned or by warehouse runtime. Careless queries and bloated storage can multiply spend with no extra value. The goal is to scan and run less while keeping results fast.

Reducing data scanned

  • Partition and cluster tables so queries prune to the needed slices.
  • Store data in columnar Parquet or ORC so only used columns are read.
  • Avoid select star on wide tables when only a few columns are needed.
  • Compact small files so engines do not pay per file overhead.

Reducing compute time

  • Right size warehouses and enable auto suspend so idle clusters stop billing.
  • Materialize expensive repeated aggregations instead of recomputing them per query.
  • Cache or precompute common dashboard queries.

Reducing storage

  • Apply lifecycle policies to move cold data to cheaper tiers or expire it.
  • Set sensible retention on raw bronze data.

Govern with visibility

Tag workloads and watch cost per query and per team. Set budgets and alerts so a runaway job is caught before the invoice.

Key idea

Cut analytics cost by scanning less through partitioning and columnar formats, running less with auto suspend and materialization, and storing less with lifecycle policies.

Check yourself

Answer to earn rating on the learn ladder.

1. How does partitioning reduce query cost in scan billed systems?

2. Why enable auto suspend on a warehouse?

3. What does materializing a repeated aggregation save?