OmniSat: Self-Supervised Modality Fusion for Earth Observation

Abstract

The diversity and complementarity of sensors available for Earth Observations (EO) calls for developing bespoke self-supervised multimodal learning approaches. However, current multimodal EO datasets and models typically focus on a single data type, either mono-date images or time series, which limits their impact. To address this issue, we introduce OmniSat, a novel architecture able to merge diverse EO modalities into expressive features without labels by exploiting their alignment. To demonstrate the advantages of our approach, we create two new multimodal datasets by augmenting existing ones with new modalities. As demonstrated for three downstream tasks—forestry, land cover classification, and crop mapping—OmniSat can learn rich representations without supervision, leading to state-of-the-art performances in semi- and fully supervised settings. Furthermore, our multimodal pretraining scheme improves performance even when only one modality is available for inference. The code and dataset are available at .

Cite

Text

Astruc et al. "OmniSat: Self-Supervised Modality Fusion for Earth Observation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73390-1_24

Markdown

[Astruc et al. "OmniSat: Self-Supervised Modality Fusion for Earth Observation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/astruc2024eccv-omnisat/) doi:10.1007/978-3-031-73390-1_24

BibTeX

@inproceedings{astruc2024eccv-omnisat,
  title     = {{OmniSat: Self-Supervised Modality Fusion for Earth Observation}},
  author    = {Astruc, Guillaume and Gonthier, Nicolas and Mallet, Clement and Landrieu, Loic},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73390-1_24},
  url       = {https://mlanthology.org/eccv/2024/astruc2024eccv-omnisat/}
}