ViTs for SITS: Vision Transformers for Satellite Image Time Series

Michail Tarasiou, Erik Chavez, Stefanos Zafeiriou

CVPR 2023 pp. 10418-10428

doi:10.1109/CVPR52729.2023.01004 /cvpr/2023/tarasiou2023cvpr-vits/

Abstract

In this paper we introduce the Temporo-Spatial Vision Transformer (TSViT), a fully-attentional model for general Satellite Image Time Series (SITS) processing based on the Vision Transformer (ViT). TSViT splits a SITS record into non-overlapping patches in space and time which are tokenized and subsequently processed by a factorized temporo-spatial encoder. We argue, that in contrast to natural images, a temporal-then-spatial factorization is more intuitive for SITS processing and present experimental evidence for this claim. Additionally, we enhance the model's discriminative power by introducing two novel mechanisms for acquisition-time-specific temporal positional encodings and multiple learnable class tokens. The effect of all novel design choices is evaluated through an extensive ablation study. Our proposed architecture achieves state-of-the-art performance, surpassing previous approaches by a significant margin in three publicly available SITS semantic segmentation and classification datasets. All model, training and evaluation codes can be found at https://github.com/michaeltrs/DeepSatModels.

PDF CVPR Semantic Scholar

Cite

Text

Tarasiou et al. "ViTs for SITS: Vision Transformers for Satellite Image Time Series." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01004

Markdown

[Tarasiou et al. "ViTs for SITS: Vision Transformers for Satellite Image Time Series." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/tarasiou2023cvpr-vits/) doi:10.1109/CVPR52729.2023.01004

BibTeX

@inproceedings{tarasiou2023cvpr-vits,
  title     = {{ViTs for SITS: Vision Transformers for Satellite Image Time Series}},
  author    = {Tarasiou, Michail and Chavez, Erik and Zafeiriou, Stefanos},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {10418-10428},
  doi       = {10.1109/CVPR52729.2023.01004},
  url       = {https://mlanthology.org/cvpr/2023/tarasiou2023cvpr-vits/}
}