Does Equivariance Matter at Scale?
Abstract
Given large data sets and sufficient compute, is it beneficial to design neural architectures for the structure and symmetries of a problem, or is it more efficient to learn them from data? We study empirically how equivariant and non-equivariant networks scale with compute and training samples. Focusing on a benchmark problem of rigid-body interactions and general-purpose transformer architectures, we perform a series of experiments, varying the model size, training steps, and dataset size. We find evidence for three conclusions. First, equivariance improves data efficiency, but training non-equivariant models with data augmentation closes this gap. Second, scaling with compute follows a power law, with equivariant models outperforming non-equivariant ones at each tested compute budget. Finally, the optimal allocation of a compute budget onto model size and training duration differs between equivariant and non-equivariant models.
Cite
Text
Brehmer et al. "Does Equivariance Matter at Scale?." NeurIPS 2024 Workshops: NeurReps, 2024.Markdown
[Brehmer et al. "Does Equivariance Matter at Scale?." NeurIPS 2024 Workshops: NeurReps, 2024.](https://mlanthology.org/neuripsw/2024/brehmer2024neuripsw-equivariance/)BibTeX
@inproceedings{brehmer2024neuripsw-equivariance,
title = {{Does Equivariance Matter at Scale?}},
author = {Brehmer, Johann and Behrends, Sönke and De Haan, Pim and Cohen, Taco},
booktitle = {NeurIPS 2024 Workshops: NeurReps},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/brehmer2024neuripsw-equivariance/}
}