SARFormer - An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data

Abstract

This manuscript introduces SARFormer, a modified Vision Transformer (ViT) architecture designed for processing one or multiple synthetic aperture radar (SAR) images. Given the complex image geometry of SAR data, we propose an acquisition parameter encoding module that significantly guides the learning process, especially in the case of multiple images, leading to improved performance on downstream tasks. We further explore self-supervised pre-training, conduct experiments with limited labeled data, and benchmark our contribution and adaptations thoroughly in ablation experiments against a baseline, where the model is tested on tasks such as height reconstruction and segmentation. Our approach achieves up to 17% improvement in terms of RMSE over baseline models, marking an important step toward developing widely applicable SAR foundation models.

Cite

Text

Prexl et al. "SARFormer - An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.

Markdown

[Prexl et al. "SARFormer - An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/prexl2025cvprw-sarformer/)

BibTeX

@inproceedings{prexl2025cvprw-sarformer,
  title     = {{SARFormer - An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data}},
  author    = {Prexl, Jonathan and Recla, Michael and Schmitt, Michael},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2025},
  pages     = {2225-2234},
  url       = {https://mlanthology.org/cvprw/2025/prexl2025cvprw-sarformer/}
}