The task
Node classification assigns a label to each node. Classify a paper by topic in a citation graph, a user as bot or human in a social network, or a protein by function. The signal comes from the node features plus the structure around it.
The key assumption
Many graphs show homophily: connected nodes tend to share labels. A paper citing many machine learning papers is probably a machine learning paper. GNNs exploit this by smoothing information across edges.
The transductive setting
Often you have one big graph where some nodes are labeled and most are not. Training and prediction happen on the same graph; you learn from the labeled nodes and propagate to the unlabeled ones. This differs from the usual split into separate train and test sets.
How a GNN does it
- Run several message passing layers so each node absorbs its neighborhood.
- Attach a classifier head that maps the final node vector to label probabilities.
- Train with cross entropy on the labeled nodes only, letting structure carry information to the rest.
Key idea
Node classification labels graph nodes from features and structure, exploiting homophily and often training transductively on one partly labeled graph.