Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder

Abstract

Despite its practical importance across a wide range of modalities, recent advances in self-supervised learning (SSL) have been primarily focused on a few well-curated domains, e.g., vision and language, often relying on their domain-specific knowledge. For example, Masked Auto-Encoder (MAE) has become one of the popular architectures in these domains, but less has explored its potential in other modalities. In this paper, we develop MAE as a unified, modality-agnostic SSL framework. In turn, we argue meta-learning as a key to interpreting MAE as a modality-agnostic learner, and propose enhancements to MAE from the motivation to jointly improve its SSL across diverse modalities, coined MetaMAE as a result. Our key idea is to view the mask reconstruction of MAE as a meta-learning task: masked tokens are predicted by adapting the Transformer meta-learner through the amortization of unmasked tokens. Based on this novel interpretation, we propose to integrate two advanced meta-learning techniques. First, we adapt the amortized latent of the Transformer encoder using gradient-based meta-learning to enhance the reconstruction. Then, we maximize the alignment between amortized and adapted latents through task contrastive learning which guides the Transformer encoder to better encode the task-specific knowledge. Our experiment demonstrates the superiority of MetaMAE in the modality-agnostic SSL benchmark (called DABS), significantly outperforming prior baselines.

Cite

Text

Jang et al. "Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder." Neural Information Processing Systems, 2023.

Markdown

[Jang et al. "Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/jang2023neurips-modalityagnostic/)

BibTeX

@inproceedings{jang2023neurips-modalityagnostic,
  title     = {{Modality-Agnostic Self-Supervised Learning with Meta-Learned Masked Auto-Encoder}},
  author    = {Jang, Huiwon and Tack, Jihoon and Choi, Daewon and Jeong, Jongheon and Shin, Jinwoo},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/jang2023neurips-modalityagnostic/}
}