Does Equivariance Matter at Scale?

Abstract

Given large datasets and sufficient compute, is it beneficial to design neural architectures for the structure and symmetries of each problem? Or is it more efficient to learn them from data? We study empirically how equivariant and non-equivariant networks scale with compute and training samples. Focusing on a benchmark problem of rigid-body interactions and on general-purpose transformer architectures, we perform a series of experiments, varying the model size, training steps, and dataset size. We find evidence for three conclusions. First, equivariance improves data efficiency, but training non-equivariant models with data augmentation can close this gap given sufficient epochs. Second, scaling with compute follows a power law, with equivariant models outperforming non-equivariant ones at each tested compute budget. Finally, the optimal allocation of a compute budget onto model size and training duration differs between equivariant and non-equivariant models.

Cite

Text

Brehmer et al. "Does Equivariance Matter at Scale?." Transactions on Machine Learning Research, 2025.

Markdown

[Brehmer et al. "Does Equivariance Matter at Scale?." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/brehmer2025tmlr-equivariance/)

BibTeX

@article{brehmer2025tmlr-equivariance,
  title     = {{Does Equivariance Matter at Scale?}},
  author    = {Brehmer, Johann and Behrends, Sönke and De Haan, Pim and Cohen, Taco},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/brehmer2025tmlr-equivariance/}
}