← Lessons

quiz vs the machine

Platinum1780

Machine Learning

Link Prediction

Predict which edges are missing or will form, the heart of graph recommendation.

5 min read · advanced · beat Platinum to climb

The task

Link prediction asks whether an edge should exist between two nodes. Will these two users become friends? Should this user be connected to this product? Recommendation is itself link prediction on a user item graph.

How it is framed

  • Learn an embedding for each node, often with a GNN.
  • Score a candidate pair by combining their two embeddings, commonly a dot product or a small network over their concatenation.
  • A high score means the link is likely.

Training with negatives

Real edges are positives. But the graph has only a few edges out of all possible pairs, so you must sample negatives: non existent edges treated as negative examples. The model learns to score true edges above sampled non edges, a ranking objective.

Evaluation pitfalls

  • Leakage is dangerous: if a test edge stays in the graph during training, the model cheats. Remove test edges from the message passing graph.
  • Metrics use ranking measures such as area under the curve and mean reciprocal rank against sampled negatives.

Key idea

Link prediction scores node pairs from their embeddings to find missing or future edges, trained with sampled negatives and guarded against edge leakage.

Check yourself

Answer to earn rating on the learn ladder.

1. How is a candidate link usually scored?

2. Why are negative samples needed in link prediction?

3. What evaluation pitfall must be avoided?