← Lessons

quiz vs the machine

Platinum1760

System Design

Follower Graph Storage

Storing who follows whom so fan out and feed queries stay fast at scale.

5 min read · advanced · beat Platinum to climb

The graph behind the feed

Every feed operation needs the follow graph, the set of directed edges where one user follows another. Fan out needs the followers of an author. Feed assembly needs the accounts a user follows. Both must be fast.

Two views of the same edge

A follow is one edge but is read in two directions, so systems store both:

  • A followers list indexed by author, used at post time to find who to fan out to.
  • A following list indexed by user, used at read time to know whose posts to gather.

Keeping both indexes means a new follow writes to two places, but each query reads exactly one.

Scaling the graph

  • The graph is huge and skewed, so it is partitioned, often by user id.
  • Edges are simple, so a key value or wide column store handles billions of them well.
  • Hot accounts with millions of followers may need their follower list sharded across partitions to avoid a single hot key.

Why a real graph database is rare here

Feeds mostly need one hop, the direct neighbors, not deep traversals. A partitioned key value design serves that one hop pattern faster and cheaper than a general graph engine.

Key idea

The follow graph is stored as two indexes, followers by author and following by user, partitioned and sometimes sharded so each one hop feed query reads a single place.

Check yourself

Answer to earn rating on the learn ladder.

1. Why store both a followers index and a following index?

2. Why is a general graph database often unnecessary for feeds?

3. How are very popular accounts handled in the graph?