← Lessons

quiz vs the machine

Gold1380

System Design

The MapReduce Paradigm

Splitting a huge job into independent map tasks and combined reduce tasks.

5 min read · core · beat Gold to climb

The model

MapReduce expresses a large batch computation as two pure functions. The map function turns each input record into zero or more key value pairs. The reduce function takes all values that share a key and combines them into a result.

Why it scales

  • Map tasks are independent, so the framework runs them in parallel across many machines on local data.
  • The framework groups pairs by key, then runs reduce tasks in parallel per key group.
  • Failed tasks simply re run because both functions are deterministic on their input.

A canonical example

Word count maps each word to the pair word and one, then reduce sums the ones for each word. The same shape handles log analysis, index building, and aggregation.

The power is that the engineer writes only map and reduce while the framework handles distribution, scheduling, and fault tolerance.

Key idea

MapReduce expresses big batch work as parallel map then grouped reduce, letting the framework handle scaling and failures.

Check yourself

Answer to earn rating on the learn ladder.

1. What does the map function produce?

2. Why can a failed map task simply be re run?

3. What does the framework do between map and reduce?