The task
Link prediction asks whether an edge should exist between two nodes. Will these two users become friends? Should this user be connected to this product? Recommendation is itself link prediction on a user item graph.
How it is framed
- Learn an embedding for each node, often with a GNN.
- Score a candidate pair by combining their two embeddings, commonly a dot product or a small network over their concatenation.
- A high score means the link is likely.
Training with negatives
Real edges are positives. But the graph has only a few edges out of all possible pairs, so you must sample negatives: non existent edges treated as negative examples. The model learns to score true edges above sampled non edges, a ranking objective.
Evaluation pitfalls
- Leakage is dangerous: if a test edge stays in the graph during training, the model cheats. Remove test edges from the message passing graph.
- Metrics use ranking measures such as area under the curve and mean reciprocal rank against sampled negatives.
Key idea
Link prediction scores node pairs from their embeddings to find missing or future edges, trained with sampled negatives and guarded against edge leakage.