Two enemies of a good feed
A feed feels broken in two ways. It shows the same post again, which feels stale, or it keeps showing old content the user has scrolled past. Deduplication and freshness fight both.
Deduplication
The system tracks what each user has already seen. On each request it filters out posts whose ids are in the seen set, so a refresh shows new material rather than repeats.
- The seen set is stored per user, often a bounded structure of recent ids.
- A probabilistic filter like a bloom filter can hold many seen ids in little memory, accepting rare false positives.
- Reshares and edits need care so the same content does not slip through under a new id.
Freshness
Freshness pushes newer content up. Ranking applies time decay so older posts lose score, and retrieval favors recent posts. A feed that ignores freshness fills with high scoring but stale items.
The balance
Dedup removes the old and seen, freshness promotes the new, and ranking still respects relevance. Together they keep each refresh feeling alive.
Key idea
Deduplication filters posts a user has already seen using a per user seen set, while freshness applies time decay so each refresh shows new, recent content.