← Lessons

quiz vs the machine

Gold1430

System Design

Deduplication And Freshness

Keeping feeds from repeating seen posts while still surfacing new content.

4 min read · core · beat Gold to climb

Two enemies of a good feed

A feed feels broken in two ways. It shows the same post again, which feels stale, or it keeps showing old content the user has scrolled past. Deduplication and freshness fight both.

Deduplication

The system tracks what each user has already seen. On each request it filters out posts whose ids are in the seen set, so a refresh shows new material rather than repeats.

  • The seen set is stored per user, often a bounded structure of recent ids.
  • A probabilistic filter like a bloom filter can hold many seen ids in little memory, accepting rare false positives.
  • Reshares and edits need care so the same content does not slip through under a new id.

Freshness

Freshness pushes newer content up. Ranking applies time decay so older posts lose score, and retrieval favors recent posts. A feed that ignores freshness fills with high scoring but stale items.

The balance

Dedup removes the old and seen, freshness promotes the new, and ranking still respects relevance. Together they keep each refresh feeling alive.

Key idea

Deduplication filters posts a user has already seen using a per user seen set, while freshness applies time decay so each refresh shows new, recent content.

Check yourself

Answer to earn rating on the learn ladder.

1. How does a feed avoid showing the same post twice?

2. Why use a bloom filter for the seen set?

3. How does ranking support freshness?