Dual Perspectives on Non-Contrastive Self-Supervised Learning

Abstract

The {\em stop gradient} and {\em exponential moving average} iterative procedures are commonly used in non-contrastive approaches to self-supervised learning to avoid representation collapse, with excellent performance in downstream applications in practice. This presentation investigates these procedures from the dual viewpoints of optimization and dynamical systems. We show that, in general, although they {\em do not} optimize the original objective, or {\em any} other smooth function, they {\em do} avoid collapse. Following [Tian et al. 2021], but without any of the extra assumptions used in their proofs, we then show using a dynamical system perspective that, in the linear case, minimizing the original objective function without the use of a stop gradient or exponential moving average {\em always} leads to collapse. Conversely, we characterize explicitly the equilibria of the dynamical systems associated with these two procedures in this linear setting as algebraic varieties in their parameter space, and show that they are, in general, {\em asymptotically stable}. Our theoretical findings are illustrated by empirical experiments with real and synthetic data.

Cite

Text

Ponce et al. "Dual Perspectives on Non-Contrastive Self-Supervised Learning." International Conference on Learning Representations, 2026.

Markdown

[Ponce et al. "Dual Perspectives on Non-Contrastive Self-Supervised Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/ponce2026iclr-dual/)

BibTeX

@inproceedings{ponce2026iclr-dual,
  title     = {{Dual Perspectives on Non-Contrastive Self-Supervised Learning}},
  author    = {Ponce, Jean and Terver, Basile and Hebert, Martial and Arbel, Michael},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/ponce2026iclr-dual/}
}