L8: Symmetry-Preserving Neural Networks

Drawing mode (d to exit, x to clear)

# Symmetry-Preserving Neural Networks

## CDS DS 595

### Siddharth Mishra-Sharma

[smsharma.io/teaching/ds595-ai4science](https://smsharma.io/teaching/ds595-ai4science.html)

---

# Logistics

1. **Assignment 1:** due tomorrow (Wed Feb 18)

2. **Assignment 2:** released tomorrow (Wed Feb 18), due Wed Mar 4

3. **Office hours:** today (Tue) 3–5pm, CDS 1528

---

# Mildred Dresselhaus (1930–2017)

.cols[
.col-1-2[
.center.width-60[![Millie Dresselhaus](figures/millie_dresselhaus.webp)]
]
.col-1-2[

Physicists like Dresselhaus showed that **encoding symmetry** (via group theory) into your analysis was what made intractable problems tractable.

.small.muted[Spiritual connection to today's lecture!]

Predicted that a nanotube's electronic behavior depends on how the sheet is rolled up: **symmetry → physics**.

**First woman tenured at MIT.**
]
]

---

# Recap: MLPs

**No structure:** no assumptions about the data.

.center[.eq-box[
$h\_{\ell+1} = \sigma\left( W^{(\ell)}\, h\_{\ell} + b^{(\ell)} \right)$
]]
]
.col-1-2[
.center.width-100[![MLP architecture](figures/mlp.png)]
]
]

---

# Recap: Inductive biases

Many hypotheses fit the training data. The inductive bias determines which ones we **prefer**.

The right inductive bias makes learning **much** easier.
]
.col-1-2[
.center[
![Inductive bias](figures/inductive_bias.png)
]
]
]

---

# Recap: Deep Sets

The sum doesn't care about order.
]
.col-1-2[
.center.width-100[![Deep Sets](figures/deep_sets.png)]
]
]

---

# Recap: CNNs

Same kernel everywhere — the network processes every position identically.

.center[.eq-box[
$z\_i = \sum\_{j \in \text{patch}(i)} w\_j \, x\_{i+j}$
]]
]
.col-1-2[
.center.width-90[![2D convolution](figures/udl_fig10_9ab_2d_conv.png)]
]
]

---

# Recap: GNNs

.cols[
.col-1-2[
**Permutation equivariance:** relabel the nodes, the per-node outputs relabel the same way.

.center[.eq-box[
$h\_v^{(\ell+1)} = \phi\big( h\_v^{(\ell)},\; \textstyle\sum\_{u \in \mathcal{N}(v)} \psi(h\_v^{(\ell)}, h\_u^{(\ell)}) \big)$
]]

Sum over neighbors is order-independent.
]
.col-1-2[
.center.width-70[![Message passing](figures/message_passing.png)]
]
]

---

# Invariance and Equivariance

---

# Invariance

Energy, charge, binding affinity, mass — all scalars.

.highlight[
**Scalars are invariant.** Rotate a molecule: same energy.
]
]
.col-1-2[
.center.width-100[![Invariance](figures/invariance.svg)]
]
]

---

# Equivariance

Forces, velocities, dipole moments — all vectors.

.highlight[
**Vectors are equivariant.** Rotate a molecule: force vectors rotate too.
]
]
.col-1-2[
.center.width-100[![Equivariance](figures/equivariance.svg)]
]
]

---

# Symmetry transformations

.center.width-80[![Geometric GNN symmetries](figures/geometric_gnn_symmetries.png)]

---

# The symmetry groups

| Group | Transformations |
|---|---|
| $S\_n$ | Permutations of $n$ objects |
| $\mathrm{SO}(3)$ | Rotations in 3D |
| $\mathrm{O}(3)$ | Rotations + reflections |
| $\mathrm{SE}(3)$ | Rotations + translations |
| $\mathrm{E}(3)$ | **Rotations + translations + reflections** |

Most molecular properties: $\mathrm{E}(3)$-invariant (energy) or $\mathrm{E}(3)$-equivariant (forces).

.small[Some quantities like chirality distinguish mirror images, requiring $\mathrm{SE}(3)$ instead.]

---

# From GNNs to geometric GNNs

$$h\_v^{(\ell+1)} = \phi\big( h\_v^{(\ell)},\; \textstyle\sum\_{u \in \mathcal{N}(v)} \psi(h\_v^{(\ell)}, h\_u^{(\ell)}) \big)$$
]
.col-1-2[
.center.width-90[![Message passing](figures/message_passing.png)]
]
]

---
count: false

# From GNNs to geometric GNNs

$$h\_v^{(\ell+1)} = \phi\big( h\_v^{(\ell)},\; \textstyle\sum\_{u \in \mathcal{N}(v)} \psi(h\_v^{(\ell)}, h\_u^{(\ell)}) \big)$$

What **geometric information** should enter the messages?

.center[.eq-box[
$m\_{ij} = \psi\big(h\_i,\; h\_j,\; \underbrace{??\vphantom{d}}\_{\text{geometry}}\big)$
]]
]
.col-1-2[
.center.width-90[![Message passing](figures/message_passing.png)]
]
]

---

# The geometric information hierarchy

| Messages use | Symmetry |
|---|---|
| Nothing geometric | Permutation only (standard GNN) |
| Relative positions $r\_i - r\_j$ | + Translation invariance |
| Distances $\lVert r\_i - r\_j \rVert$ | + Rotation invariance |
| Distances + relative vectors for coordinate updates | + Rotation **equivariance** |

.highlight[
Each row restricts the geometric input further, gaining a symmetry — and shrinking the space of functions the network can represent.
]

---

# SchNet: distance-based message passing

.cols[
.col-1-2[
.center[.eq-box[
$h\_i' = h\_i + \sum\_{j \in \mathcal{N}(i)} \phi(h\_j) \cdot w(d\_{ij})$
]]

Neighbor features weighted by a learned function of distance.

Distances discard directional information — two neighbors at the same distance but different angles are indistinguishable.

.warning[How do we recover angular information?]
]
.col-1-2[
.center.width-100[![Message passing](figures/gnn_message_passing_detailed.png)]

.small.muted.center[General message passing. SchNet specializes it: messages depend only on distance.]
]
]

---

# Encoding distances: radial basis functions

A single scalar $d$ doesn't give the network much to work with. Expand it:

$$e\_k(d) = \exp\left( -\gamma\, (d - \mu\_k)^2 \right)$$

.center.width-80[![Radial basis functions](figures/radial_basis_functions.svg)]

.small[Centers $\mu\_k$ spaced from 0 to a cutoff. Each Gaussian "activates" for distances near its center. One scalar becomes a $K$-dimensional vector.]

---

# Towards equivariance

.cols[
.col-1-2[
Distance-based networks are rotation-invariant — but they can only predict **scalars**.

What if you need **forces**, **velocities**, **dipole moments**? These are **vectors** — they should rotate with the input. Distances alone can't give you that.
]
.col-1-2[
.center.width-100[![Equivariance](figures/equivariance.svg)]
]
]

---

# Nonlinearities on vectors

.center.width-60[![ReLU is not equivariant](figures/relu_not_equivariant.svg)]

---
count: false

# Nonlinearities on vectors

Three operations that **do** commute with rotation:

.center.width-80[![Equivariant operations](figures/equivariant_operations.svg)]

---

# A recipe for equivariant networks

.center.width-80[![Equivariant recipe](figures/equivariant_recipe.svg)]

.highlight[
Nonlinearity lives entirely in scalar-space. The vector pathway stays linear — just scaling and adding — which is all it needs to stay equivariant.
]

---

# EGNN: equivariant message passing

**1.** $m\_{ij} = \phi\_e(h\_i, h\_j, d\_{ij}^2)$
.muted[scalar messages from invariant inputs]

**2.** $x\_i' = x\_i + \textstyle\sum\_j (x\_i - x\_j)\,\phi\_x(m\_{ij})$
.muted[coordinate update: scalar $\times$ relative vector]

**3.** $h\_i' = \phi\_h(h\_i, \textstyle\sum\_j m\_{ij})$
.muted[feature update from aggregated messages]
]
.col-1-2[
.center.width-100[![Message passing](figures/gnn_message_passing_detailed.png)]
]
]

---

# EGNN: equivariant message passing

.center.width-90[![EGNN layer](figures/egnn_layer.svg)]
.footnote[Satorras et al., "EGNN" (2021)]

---

# EGNN with velocities

$$v\_i' = \phi\_v(h\_i)\, v\_i + \textstyle\sum\_j (x\_i - x\_j)\,\phi\_x(m\_{ij})$$

$$x\_i' = x\_i + v\_i'$$

Useful for dynamics, where atoms have momenta.
]
.col-1-2[
.center.width-100[![Velocity update](figures/velocity_update.svg)]
]
]

---

# Beyond scalars and vectors

EGNN uses $L=0$ (scalars) and $L=1$ (vectors). But some local environments need **higher-order features** to distinguish.

.center.width-50[![Angular resolution](figures/angular_resolution.svg)]

---
count: false

# Beyond scalars and vectors

.center.width-70[![Spherical harmonics](figures/spherical_harmonics.svg)]

Spherical harmonics $Y\_l^m$ are a basis for directional information — each $L$ block transforms predictably under rotation, which is why equivariant networks use them.

Models using higher-order features ($L \geq 2$) are significantly more data-efficient.

---

# Applications

---

# Geometric GNNs for atomic systems

.center.width-90[![Timeline of geometric GNNs](figures/geometric_gnn_timeline.png)]

.small.muted[Duval et al., "A Hitchhiker's Guide to Geometric GNNs for 3D Atomic Systems" (2023)]

---

# Molecular dynamics

$$m\_i \ddot{x}\_i = F\_i = -\nabla\_{x\_i} E(\{x\_j\})$$

$E(\{x\_j\})$ is the **potential energy surface**: all $N$ positions in, one scalar out. The force on each atom is the gradient — it captures the influence of every other atom.
]
.col-1-2[
.center.width-90[![MD simulation of a protein in water](figures/md_protein.gif)]
.small.muted.center[Protein in water]
]
]

---

# Where forces come from

.center.width-70[![ML potentials: accuracy vs cost](figures/mlp_accuracy_cost.png)]
.small.muted[Friederich et al., Nature Materials (2021)]

---

# Symmetry and the gradient trick

Energy is invariant: $\quad E(\{x\_i\}) = E(\{Rx\_i + t\})$

Forces are equivariant: $\quad F\_i(\{Rx\_j\}) = R\, F\_i(\{x\_j\})$

Train an **invariant** network to predict $E$, then get forces by automatic differentiation:

If $E$ is rotation-invariant, $F\_i$ is **automatically** rotation-equivariant — equivariance for free.

---

# Should We Always Encode Symmetry?

---

# Data efficiency

An equivariant network **knows** rotations preserve energy — it doesn't need to see every rotation of every molecule.

.center.width-60[![Data efficiency](figures/new_water_f_mae_data_eff.png)]

.small.muted[Batzner et al., NequIP (2022). Higher-order equivariant features ($L \geq 1$) achieve lower error at every training set size.]

---

# Compute efficiency

Even with infinite data, equivariant models use compute more efficiently.

.center.width-30[![Compute scaling](figures/compute_scaling.png)]

Both follow power-law scaling, but equivariant (red) maintains a consistent advantage at every compute budget.

.small.muted[Brehmer et al. (2025)]

---

# The alternative: data augmentation

Instead of encoding symmetry, **train on transformed copies** of the data so the network learns invariance from examples.

.center.width-50[![Data augmentation](figures/data_augmentation.svg)]

---

# The optimization perspective

.cols[
.col-1-2[
Constraining the function space of the neural network can make optimization more challenging!

A more general model has "more room to breathe" and can be earier from an optimization perspective.

A tradeoff to consider.
]
.col-1-2[
.center.width-120[![Constrained optimization](figures/constrained_optimization.jpg)]

]
]

---

# Sometimes (?) the bitter lesson wins

AlphaFold 3 **dropped equivariance** in favor of a diffusion-based architecture with data augmentation.

Seems like it was simply not needed and simplicity won out.
]
.col-1-2[
.center.width-100.shadow[![AlphaFold 3 on equivariance](figures/alphafold3-equivariance.png)]
.small.muted.center[Abramson et al., Nature (2024)]
]
]

---

# Summary

.highlight[
**Invariance**: $f(g \cdot x) = f(x)$     **Equivariance**: $f(g \cdot x) = g \cdot f(x)$
]

.cols[
.col-1-2[
.center.width-100[![EGNN layer](figures/egnn_layer.svg)]
.center.small[Equivariant message passing updates both scalar features and coordinates.]
]
.col-1-2[
.center.width-100[![Data efficiency](figures/new_water_f_mae_data_eff.png)]
.center.small[Built-in symmetry often means less data needed to reach the same accuracy.]
]
]

---

# Next time: Generative models

.center[
<video width="60%" controls loop autoplay muted>
  <source src="figures/seedance_video.mp4" type="video/mp4">
</video>
]

.center.small.muted[Seedance 2.0 video gen]