Extending Global-Local View Alignment for Self-Supervised Learning with Remote Sensing Imagery

Abstract

Since large number of high-quality remote sensing images are readily accessible, exploiting the corpus of images with less manual annotation draws increasing attention. Self-supervised models acquire general feature representations by formulating a pretext task that generates pseudo-labels for massive unlabeled data to provide supervision for training. While prior studies have explored multiple self-supervised learning techniques in remote sensing domain, pretext tasks based on local-global view alignment remain underexplored, despite achieving state-of-the-art results on natural imagery. Inspired by DINO [6], which employs an effective representation learning structure with knowledge distillation based on global-local view alignment, we formulate two pretext tasks for self-supervised learning on remote sensing imagery (SSLRS). Using these tasks, we explore the effectiveness of positive temporal contrast as well as multi-sized views on SSLRS. We extend DINO and propose DINO-MC which uses local views of various sized crops instead of a single fixed size in order to alleviate the limited variation in object size observed in remote sensing imagery. Our experiments demonstrate that even when pre-trained on only 10% of the dataset, DINO-MC performs on par or better than existing state-of-the-art SSLRS methods on multiple remote sensing tasks, while using less computational resources. All codes, models, and results are released at https://github.com/WennyXY/DINO-MC.

Cite

Text

Wanyan et al. "Extending Global-Local View Alignment for Self-Supervised Learning with Remote Sensing Imagery." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00251

Markdown

[Wanyan et al. "Extending Global-Local View Alignment for Self-Supervised Learning with Remote Sensing Imagery." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/wanyan2024cvprw-extending/) doi:10.1109/CVPRW63382.2024.00251

BibTeX

@inproceedings{wanyan2024cvprw-extending,
  title     = {{Extending Global-Local View Alignment for Self-Supervised Learning with Remote Sensing Imagery}},
  author    = {Wanyan, Xinye and Seneviratne, Sachith and Shen, Shuchang and Kirley, Michael},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {2443-2453},
  doi       = {10.1109/CVPRW63382.2024.00251},
  url       = {https://mlanthology.org/cvprw/2024/wanyan2024cvprw-extending/}
}