A Survey on Masked Autoencoder for Visual Self-Supervised Learning
Abstract
With the increasing popularity of masked autoencoders, self-supervised learning (SSL) in vision undertakes a similar trajectory as in NLP. Specifically, generative pretext tasks with the masked prediction have become a de facto standard SSL practice in NLP (e.g., BERT). By contrast, early attempts at generative methods in vision have been outperformed by their discriminative counterparts (like contrastive learning). However, the success of masked image modeling has revived the autoencoder-based visual pretraining method. As a milestone to bridge the gap with BERT in NLP, masked autoencoder in vision has attracted unprecedented attention. This work conducts a survey on masked autoencoders for visual SSL.
Cite
Text
Zhang et al. "A Survey on Masked Autoencoder for Visual Self-Supervised Learning." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/762Markdown
[Zhang et al. "A Survey on Masked Autoencoder for Visual Self-Supervised Learning." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/zhang2023ijcai-survey/) doi:10.24963/IJCAI.2023/762BibTeX
@inproceedings{zhang2023ijcai-survey,
title = {{A Survey on Masked Autoencoder for Visual Self-Supervised Learning}},
author = {Zhang, Chaoning and Zhang, Chenshuang and Song, Junha and Yi, John Seon Keun and Kweon, In So},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2023},
pages = {6805-6813},
doi = {10.24963/IJCAI.2023/762},
url = {https://mlanthology.org/ijcai/2023/zhang2023ijcai-survey/}
}