L7: Graph Neural Networks

Drawing mode (d to exit, x to clear)

# Graph Neural Networks

## CDS DS 595

### Siddharth Mishra-Sharma

[smsharma.io/teaching/ds595-ai4science](https://smsharma.io/teaching/ds595-ai4science.html)

---

# Recap: MLPs

**No structure:** no assumptions about locality or ordering.

]
.col-1-2[
.center.width-100[![MLP architecture](figures/mlp.png)]
]
]

---

# Recap: CNNs

**Weight sharing:** reuse the same weights (kernel) at every position.

.center[.eq-box[
$h\_{i}^{(\ell+1)} = \sigma\left( \sum\_{j \in \text{neighbors}(i)} W^{(\ell)}\, h\_{j}^{(\ell)} \right)$
]]

.small[Neighbors = pixels in a local patch. Same $W$ everywhere.]
]
.col-1-2[
.center.width-90[![2D convolution](figures/udl_fig10_9ab_2d_conv.png)]
]
]

---

# Part 1: From Grids to Graphs

---

# Graphs are everywhere

.center.width-80[![Real-world graphs](figures/udl_fig13_1_graphs.png)]

A **graph** = **nodes** (entities) + **edges** (relationships). Molecules, road networks, circuits, social networks, point clouds, protein structures.

---

# Representing a graph

.center.width-80[![Adjacency matrix](figures/graph_adjacency.png)]

.small.muted.center[(a) A graph with 8 nodes. (b) Its adjacency matrix **A** — entry $A_{ij} = 1$ if nodes $i$, $j$ are connected. (c) **A²** counts 2-hop paths.]

---

# Node ordering is arbitrary

.center.width-80[![Permutation](figures/udl_fig13_5_permutation.png)]

Same graph, different node numbering → different A and X matrices. The network must give the **same output** regardless.

---

# Part 2: Deep Sets

---

# Sets: graphs without edges

.cols[
.col-1-2[
Given a set $\{x\_1, \ldots, x\_n\}$, how do we build a function whose output doesn't depend on the order we list the elements?

]
.col-1-2[
.center.width-100[![N-body point cloud](figures/nbody_pointcloud.png)]

.small.muted.center[N-body simulation: particles as an unordered point cloud in 3D.]
]
]

---

# The Deep Sets theorem

1. **Embed** each element with shared $\phi$
2. **Sum** (or mean, or max)
3. **Predict** with $\rho$

Why $\rho$? The sum is linear over embeddings — $\rho$ adds nonlinearity *after* aggregation, and maps to the output space.
]
.col-1-2[
.center.width-100[![Deep Sets](figures/deep_sets.png)]
]
]

---

# Part 3: Message Passing

---

# GCN: the simplest message passing

.center[.eq-box[
$h\_v^{(\ell+1)} = \sigma\left( \sum\_{u \in \mathcal{N}(v)} c\_{vu}\, W^{(\ell)}\, h\_u^{(\ell)} \right)$
]]

$c\_{vu}$ is a **fixed** normalization constant (e.g. $1/\sqrt{|\mathcal{N}(v)|\,|\mathcal{N}(u)|}$).

In matrix form: $H^{(\ell+1)} = \sigma(\tilde{A}\, H^{(\ell)}\, W^{(\ell)})$ — the entire message-passing step is a matrix multiply.
]
.col-1-2[
.center.width-70[![Message passing](figures/message_passing.png)]
]
]

---

# GCN: growing receptive fields

Each node starts knowing only about itself.

After **1 layer** → knows about neighbors. After **2 layers** → neighbors of neighbors. After **$k$ layers** → $k$-hop neighborhood.

.center.width-80[![GCN layer](figures/udl_fig13_7_gcn_layer.png)]

.small.muted.center[UDL Fig 13.7: (a) Input graph with node embeddings. (b) After one layer. (c) After $k$ layers — embeddings encode graph context.]

---

# The general framework: message, aggregate, update

.center.width-60[![Message passing steps](figures/gnn_message_passing_detailed.png)]

.center[.eq-box[
$h\_v^{(\ell+1)} = \phi\left( h\_v^{(\ell)},\; \sum\_{u \in \mathcal{N}(v)} \psi(h\_v^{(\ell)}, h\_u^{(\ell)}, e\_{uv}) \right)$
]]

GCN is a special case: $\psi(h\_v, h\_u, e\_{uv}) = c\_{vu}\, W\, h\_u$ and $\phi = \sigma$.

---

# Part 4: GNNs for Molecules

---

# QM9: MLP vs GNN

**GNN:** Atoms = nodes, bonds = edges, 3D coordinates as node features. Message pass → pool → predict.
]
.col-1-2[
.center.width-100[![QM9 molecules as graphs](figures/qm9_molecules_grid.png)]
]
]

---

# Coding session: GNN on QM9

.live-coding[
**Notebook:** [GNN on QM9](https://colab.research.google.com/github/bu-ds595/course-materials-spring26/blob/main/notebooks/qm9_gnn.ipynb)

Train a GNN on QM9 and compare to the MLP baseline from L5.
]

---

# Some neighbors are more equal than others

$$h\_v^{(\ell+1)} = \sigma\left( \sum\_{u \in \mathcal{N}(v)} c\_{vu}\, W\, h\_u^{(\ell)} \right)$$

Every neighbor contributes equally. But some neighbors are more equal than others!
]
.col-1-2[
.center.width-50[![GCN fixed weights](figures/gcn_fixed_weights.png)]
]
]

What if we want each neighbor's contribution to depend on its content?

---

# Graph attention .small.muted[— Velickovic et al. (2018)]

.cols[
.col-1-2[
1. **Score:** learned similarity between node pairs
2. **Normalize:** softmax so weights sum to 1
3. **Aggregate:** same as GCN, but $c\_{vu} \to \alpha\_{vu}$

$$\alpha\_{vu} = \frac{\exp\big(f(Wh\_v,\, Wh\_u)\big)}{\sum\_{u' \in \mathcal{N}(v)} \exp\big(f(Wh\_v,\, Wh\_{u'})\big)}$$

.center[.eq-box[
$h\_v' = \sigma\left( \sum\_{u \in \mathcal{N}(v)} \alpha\_{vu}\, W\, h\_u \right)$
]]
]
.col-1-2[
.center.width-90[![GCN to GAT](figures/gcn_to_gat.png)]
]
]

---

# GNNs in science

.small.muted.center[Lam et al., "GraphCast" (2023)]
]
.col-1-2[
**Molecules and materials**
- Drug discovery, catalyst design
- ML force fields for molecular dynamics

**Proteins**
- AlphaFold: graph-like attention over residues

**Physics**
- Jet tagging at the LHC (particles as nodes)
- N-body dynamics with learned interactions
- Weather forecasting (GraphCast)
]
]

---

# Limitations

.small[Most GNNs use 3–6 layers.]
]
.col-1-2[
.center.width-100[![Oversmoothing](figures/oversmoothing.png)]
]
]

**Missing geometry:** Standard GNNs don't understand geometry!

---

# Next time

**Geometric neural networks:** Building in 3D structure.

.center.width-70[![Geometric GNN symmetries](figures/geometric_gnn_symmetries.png)]