Connecting Joint-Embedding Predictive Architecture with Contrastive Self-Supervised Learning

Abstract

In recent advancements in unsupervised visual representation learning, the Joint-Embedding Predictive Architecture (JEPA) has emerged as a significant method for extracting visual features from unlabeled imagery through an innovative masking strategy. Despite its success, two primary limitations have been identified: the inefficacy of Exponential Moving Average (EMA) from I-JEPA in preventing entire collapse and the inadequacy of I-JEPA prediction in accurately learning the mean of patch representations. Addressing these challenges, this study introduces a novel framework, namely C-JEPA (Contrastive-JEPA), which integrates the Image-based Joint-Embedding Predictive Architecture with the Variance-Invariance-Covariance Regularization (VICReg) strategy. This integration is designed to effectively learn the variance/covariance for preventing entire collapse and ensuring invariance in the mean of augmented views, thereby overcoming the identified limitations. Through empirical and theoretical evaluations, our work demonstrates that C-JEPA significantly enhances the stability and quality of visual representation learning. When pre-trained on the ImageNet-1K dataset, C-JEPA exhibits rapid and improved convergence in both linear probing and fine-tuning performance metrics.

Cite

Text

Mo and Tong. "Connecting Joint-Embedding Predictive Architecture with Contrastive Self-Supervised Learning." Neural Information Processing Systems, 2024. doi:10.52202/079017-0077

Markdown

[Mo and Tong. "Connecting Joint-Embedding Predictive Architecture with Contrastive Self-Supervised Learning." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/mo2024neurips-connecting/) doi:10.52202/079017-0077

BibTeX

@inproceedings{mo2024neurips-connecting,
  title     = {{Connecting Joint-Embedding Predictive Architecture with Contrastive Self-Supervised Learning}},
  author    = {Mo, Shentong and Tong, Shengbang},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0077},
  url       = {https://mlanthology.org/neurips/2024/mo2024neurips-connecting/}
}