Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing

Abstract

Remote sensing images present unique challenges to image analysis due to the extensive geographic coverage, hardware limitations, and misaligned multi-scale images. This paper revisits the classical multi-scale representation learning prob- lem but under the general framework of self-supervised learning for remote sensing image understanding. We present Cross-Scale MAE, a self-supervised model built upon the Masked Auto-Encoder (MAE). During pre-training, Cross-Scale MAE employs scale augmentation techniques and enforces cross-scale consistency constraints through both contrastive and generative losses to ensure consistent and meaningful representations well-suited for a wide range of downstream tasks. Further, our implementation leverages the xFormers library to accelerate network pre-training on a single GPU while maintaining the quality of learned represen- tations. Experimental evaluations demonstrate that Cross-Scale MAE exhibits superior performance compared to standard MAE and other state-of-the-art remote sensing MAE methods.

Cite

Text

Tang et al. "Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing." Neural Information Processing Systems, 2023.

Markdown

[Tang et al. "Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/tang2023neurips-crossscale/)

BibTeX

@inproceedings{tang2023neurips-crossscale,
  title     = {{Cross-Scale MAE: A Tale of Multiscale Exploitation in Remote Sensing}},
  author    = {Tang, Maofeng and Cozma, Andrei and Georgiou, Konstantinos and Qi, Hairong},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/tang2023neurips-crossscale/}
}