The MapReduce Paradigm

The model

MapReduce expresses a large batch computation as two pure functions. The map function turns each input record into zero or more key value pairs. The reduce function takes all values that share a key and combines them into a result.

Why it scales

Map tasks are independent, so the framework runs them in parallel across many machines on local data.
The framework groups pairs by key, then runs reduce tasks in parallel per key group.
Failed tasks simply re run because both functions are deterministic on their input.

A canonical example

Word count maps each word to the pair word and one, then reduce sums the ones for each word. The same shape handles log analysis, index building, and aggregation.

The power is that the engineer writes only map and reduce while the framework handles distribution, scheduling, and fault tolerance.

Key idea

MapReduce expresses big batch work as parallel map then grouped reduce, letting the framework handle scaling and failures.

The MapReduce Paradigm

The model

Why it scales

A canonical example

Key idea

Check yourself