Rethinking Patch Dependence for Masked Autoencoders
Abstract
In this work, we present cross-attention masked autoencoders (CrossMAE). This framework employs only cross-attention in the decoder to independently read out reconstructions for a small subset of masked patches from encoder outputs, yet it achieves comparable or superior performance to traditional MAE across models ranging from ViT-S to ViT-H. CrossMAE challenges the necessity of interaction between mask tokens for effective masked pretraining. Code is available [here](https://anonymous.4open.science/r/mae-cross-anon-11EB/README.md).
Cite
Text
Fu et al. "Rethinking Patch Dependence for Masked Autoencoders." NeurIPS 2024 Workshops: SSL, 2024.Markdown
[Fu et al. "Rethinking Patch Dependence for Masked Autoencoders." NeurIPS 2024 Workshops: SSL, 2024.](https://mlanthology.org/neuripsw/2024/fu2024neuripsw-rethinking/)BibTeX
@inproceedings{fu2024neuripsw-rethinking,
title = {{Rethinking Patch Dependence for Masked Autoencoders}},
author = {Fu, Letian and Lian, Long and Wang, Renhao and Shi, Baifeng and Wang, XuDong and Yala, Adam and Darrell, Trevor and Efros, Alexei A and Goldberg, Ken},
booktitle = {NeurIPS 2024 Workshops: SSL},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/fu2024neuripsw-rethinking/}
}