Rethinking Patch Dependence for Masked Autoencoders

Abstract

In this work, we present cross-attention masked autoencoders (CrossMAE). This framework employs only cross-attention in the decoder to independently read out reconstructions for a small subset of masked patches from encoder outputs, yet it achieves comparable or superior performance to traditional MAE across models ranging from ViT-S to ViT-H. CrossMAE challenges the necessity of interaction between mask tokens for effective masked pretraining. Code is available [here](https://anonymous.4open.science/r/mae-cross-anon-11EB/README.md).

Cite

Text

Fu et al. "Rethinking Patch Dependence for Masked Autoencoders." NeurIPS 2024 Workshops: SSL, 2024.

Markdown

[Fu et al. "Rethinking Patch Dependence for Masked Autoencoders." NeurIPS 2024 Workshops: SSL, 2024.](https://mlanthology.org/neuripsw/2024/fu2024neuripsw-rethinking/)

BibTeX

@inproceedings{fu2024neuripsw-rethinking,
  title     = {{Rethinking Patch Dependence for Masked Autoencoders}},
  author    = {Fu, Letian and Lian, Long and Wang, Renhao and Shi, Baifeng and Wang, XuDong and Yala, Adam and Darrell, Trevor and Efros, Alexei A and Goldberg, Ken},
  booktitle = {NeurIPS 2024 Workshops: SSL},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/fu2024neuripsw-rethinking/}
}