Drawing mode (d to exit, x to clear)
class: middle, title-slide # Graph Neural Networks ## CDS DS 595 ### Siddharth Mishra-Sharma [smsharma.io/teaching/ds595-ai4science](https://smsharma.io/teaching/ds595-ai4science.html) .small[📄 [Notes](https://bu-ds595.github.io/course-materials-spring26/notes/03-neural-networks.pdf)] --- # Recap: MLPs .cols[ .col-1-2[ **Fully connected:** every input connects to every output. **No structure:** no assumptions about locality or ordering. .center[.eq-box[ $h\_{\ell+1} = \sigma\left( W^{(\ell)}\, h\_{\ell} + b^{(\ell)} \right)$ ]] ] .col-1-2[ .center.width-100[] ] ] --- # Recap: CNNs .cols[ .col-1-2[ **Locality:** process small neighborhoods. **Weight sharing:** reuse the same weights (kernel) at every position. .center[.eq-box[ $h\_{i}^{(\ell+1)} = \sigma\left( \sum\_{j \in \text{neighbors}(i)} W^{(\ell)}\, h\_{j}^{(\ell)} \right)$ ]] .small[Neighbors = pixels in a local patch. Same $W$ everywhere.] ] .col-1-2[ .center.width-90[] ] ] --- class: center, middle, section-slide # Part 1: From Grids to Graphs --- # Graphs are everywhere .center.width-80[] A **graph** = **nodes** (entities) + **edges** (relationships). Molecules, road networks, circuits, social networks, point clouds, protein structures. --- # Representing a graph .center.width-80[] .small.muted.center[(a) A graph with 8 nodes. (b) Its adjacency matrix **A** — entry $A_{ij} = 1$ if nodes $i$, $j$ are connected. (c) **A²** counts 2-hop paths.] --- # Node ordering is arbitrary .center.width-80[] Same graph, different node numbering → different A and X matrices. The network must give the **same output** regardless. --- class: center, middle, section-slide # Part 2: Deep Sets .small[Handling unordered collections] --- # Sets: graphs without edges .cols[ .col-1-2[ Given a set $\{x\_1, \ldots, x\_n\}$, how do we build a function whose output doesn't depend on the order we list the elements? ] .col-1-2[ .center.width-100[] .small.muted.center[N-body simulation: particles as an unordered point cloud in 3D.] ] ] --- # The Deep Sets theorem .cols[ .col-1-2[ Any order-independent function: .center[.eq-box[ $f(\{x\_1, \ldots, x\_n\}) = \rho\left( \sum\_{i=1}^n \phi(x\_i) \right)$ ]] 1. **Embed** each element with shared $\phi$ 2. **Sum** (or mean, or max) 3. **Predict** with $\rho$ Why $\rho$? The sum is linear over embeddings — $\rho$ adds nonlinearity *after* aggregation, and maps to the output space. ] .col-1-2[ .center.width-100[] ] ] .footnote[Zaheer et al., "Deep Sets" (2017)] --- class: center, middle, section-slide # Part 3: Message Passing .small[The core GNN operation] --- # GCN: the simplest message passing .cols[ .col-1-2[ Each node aggregates its neighbors and applies a linear layer: .center[.eq-box[ $h\_v^{(\ell+1)} = \sigma\left( \sum\_{u \in \mathcal{N}(v)} c\_{vu}\, W^{(\ell)}\, h\_u^{(\ell)} \right)$ ]] $c\_{vu}$ is a **fixed** normalization constant (e.g. $1/\sqrt{|\mathcal{N}(v)|\,|\mathcal{N}(u)|}$). In matrix form: $H^{(\ell+1)} = \sigma(\tilde{A}\, H^{(\ell)}\, W^{(\ell)})$ — the entire message-passing step is a matrix multiply. ] .col-1-2[ .center.width-70[] ] ] --- # GCN: growing receptive fields Each node starts knowing only about itself. After **1 layer** → knows about neighbors. After **2 layers** → neighbors of neighbors. After **$k$ layers** → $k$-hop neighborhood. .center.width-80[] .small.muted.center[UDL Fig 13.7: (a) Input graph with node embeddings. (b) After one layer. (c) After $k$ layers — embeddings encode graph context.] --- # The general framework: message, aggregate, update .center.width-60[] .center[.eq-box[ $h\_v^{(\ell+1)} = \phi\left( h\_v^{(\ell)},\; \sum\_{u \in \mathcal{N}(v)} \psi(h\_v^{(\ell)}, h\_u^{(\ell)}, e\_{uv}) \right)$ ]] GCN is a special case: $\psi(h\_v, h\_u, e\_{uv}) = c\_{vu}\, W\, h\_u$ and $\phi = \sigma$. --- class: center, middle, section-slide # Part 4: GNNs for Molecules .small[QM9 revisited] --- # QM9: MLP vs GNN .cols[ .col-1-2[ **MLP (L5):** Atom counts only. "3 C, 1 N, 2 O." No bonding. **GNN:** Atoms = nodes, bonds = edges, 3D coordinates as node features. Message pass → pool → predict. ] .col-1-2[ .center.width-100[] ] ] --- # Coding session: GNN on QM9 .live-coding[ **Notebook:** [GNN on QM9](https://colab.research.google.com/github/bu-ds595/course-materials-spring26/blob/main/notebooks/qm9_gnn.ipynb) Train a GNN on QM9 and compare to the MLP baseline from L5. ] --- # Some neighbors are more equal than others .cols[ .col-1-2[ Recall the GCN update: $$h\_v^{(\ell+1)} = \sigma\left( \sum\_{u \in \mathcal{N}(v)} c\_{vu}\, W\, h\_u^{(\ell)} \right)$$ Every neighbor contributes equally. But some neighbors are more equal than others! ] .col-1-2[ .center.width-50[] ] ] -- What if we want each neighbor's contribution to depend on its content? --- # Graph attention .small.muted[— Velickovic et al. (2018)] .cols[ .col-1-2[ 1. **Score:** learned similarity between node pairs 2. **Normalize:** softmax so weights sum to 1 3. **Aggregate:** same as GCN, but $c\_{vu} \to \alpha\_{vu}$ $$\alpha\_{vu} = \frac{\exp\big(f(Wh\_v,\, Wh\_u)\big)}{\sum\_{u' \in \mathcal{N}(v)} \exp\big(f(Wh\_v,\, Wh\_{u'})\big)}$$ .center[.eq-box[ $h\_v' = \sigma\left( \sum\_{u \in \mathcal{N}(v)} \alpha\_{vu}\, W\, h\_u \right)$ ]] ] .col-1-2[ .center.width-90[] ] ] --- # GNNs in science .cols[ .col-1-2[ .center.width-100[] .small.muted.center[Lam et al., "GraphCast" (2023)] ] .col-1-2[ **Molecules and materials** - Drug discovery, catalyst design - ML force fields for molecular dynamics **Proteins** - AlphaFold: graph-like attention over residues **Physics** - Jet tagging at the LHC (particles as nodes) - N-body dynamics with learned interactions - Weather forecasting (GraphCast) ] ] --- # Limitations .cols[ .col-1-2[ **Oversmoothing:** Too many layers → all nodes converge to the same representation. .small[Most GNNs use 3–6 layers.] ] .col-1-2[ .center.width-100[] ] ] -- **Missing geometry:** Standard GNNs don't understand geometry! --- # Next time **Geometric neural networks:** Building in 3D structure. .center.width-70[]