Soft Equivariance Regularization for Invariant Self-Supervised Learning

Abstract

Self-supervised learning (SSL) typically learns representations invariant to semantic-preserving augmentations (e.g., random crops and photometric jitter). While effective for recognition, enforcing strong invariance can suppress transformation-dependent structure that is useful for robustness to geometric perturbations and spatially sensitive transfer. A growing body of work, therefore, augments invariance-based SSL with equivariance objectives, but these objectives are often imposed on the same **final** (typically spatially-collapsed) representation. We empirically observe a trade-off in this coupled setting: pushing equivariance regularization toward deeper layers improves equivariance scores but degrades ImageNet-1k linear evaluation, motivating a layer-decoupled design. Motivated by this trade-off, we propose **Soft Equivariance Regularization (SER)**, a plug-in regularizer that decouples where invariance and equivariance are enforced: we keep the base SSL objective unchanged on the final embedding, while softly encouraging equivariance on an \emph{intermediate spatial token map} via analytically specified group actions $\rho_g$ (e.g., $90^{\circ}$ rotations, flips, and scaling) applied directly in feature space. SER learns/predicts no per-sample transformation codes/labels, requires no auxiliary transformation-prediction head, and adds only **1.008$\times$** training FLOPs. On ImageNet-1k ViT-S/16 pretraining, SER improves MoCo-v3 by **+0.84** Top-1 in linear evaluation under a strictly matched 2-view setting and consistently improves DINO and Barlow Twins; under matched view counts, SER achieves the best ImageNet-1k linear-eval Top-1 among the compared invariance+equivariance add-ons. SER further improves ImageNet-C/P by **+1.11/+1.22** Top-1 and frozen-backbone COCO detection by **+1.7** mAP. Finally, applying the same **layer-decoupling** recipe to existing invariance+equivariance baselines (e.g., EquiMod and AugSelf) improves their accuracy, suggesting layer decoupling as a general design principle for combining invariance and equivariance. Code is available at https://github.com/aitrics-chris/SER.

Cite

Text

Lee et al. "Soft Equivariance Regularization for Invariant Self-Supervised Learning." International Conference on Learning Representations, 2026.

Markdown

[Lee et al. "Soft Equivariance Regularization for Invariant Self-Supervised Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/lee2026iclr-soft/)

BibTeX

@inproceedings{lee2026iclr-soft,
  title     = {{Soft Equivariance Regularization for Invariant Self-Supervised Learning}},
  author    = {Lee, Joohyung and Kim, Changhun and Kim, Hyunsu and Lee, Kwanhyung and Lee, Juho},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/lee2026iclr-soft/}
}