Runs and the breaks between them
The gaps and islands problem asks you to identify consecutive sequences, called islands, and the missing stretches between them, called gaps. Examples include consecutive login days, unbroken ranges of ids, or periods of continuous activity.
The difference of two sequences trick
The classic solution compares two running sequences:
- Generate ROW NUMBER ordered by the value.
- Subtract that row number from the value itself, or from a grouping key.
- Rows in the same island share the same difference, because both sequences advance in lockstep.
When the values are consecutive, the gap between the value and its row number stays constant. A break in consecutiveness shifts that difference, starting a new island.
Grouping by the constant
Once each row carries this constant difference, you GROUP BY it. Each group is one island. MIN and MAX of the value within a group give the island's start and end. To find gaps, use LEAD on the island ends to compare each island's end with the next island's start.
Why it works
Both the actual value and the row number increase by one across consecutive rows, so their difference is invariant within a run and only changes at a discontinuity.
Key idea
Gaps and islands are solved by subtracting a row number from the value; consecutive rows share a constant difference that uniquely labels each island, and grouping by it reveals the runs.