Attention Decomposition for Cross-Domain Semantic Segmentation

Abstract

This work addresses cross-domain semantic segmentation. While recent CNNs and proposal-free transformers led to significant advances, we introduce a new transformer with a lighter encoder and more complex decoder with query tokens for predicting segmentation masks, called . The domain gap between the source and target domains is reduced with two mechanisms. First, we decompose cross-attention in the decoder into domain-independent and domain-specific parts to enforce the query tokens interact with the domain-independent aspects of the image tokens, shared by the source and target domains, rather than domain-specific counterparts which induce the domain gap. Second, we use the gradient reverse block to control back-propagation of the gradient, and hence introduce adversarial learning in the decoder of . Our results on two benchmark domain shifts – GTA to Cityscapes and SYNTHIA to Cityscapes – show that outperforms SOTA proposal-free methods with significantly lower complexity. The implementation is available at https://github.com/helq2612/ADFormer.

Cite

Text

He and Todorovic. "Attention Decomposition for Cross-Domain Semantic Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72630-9_24

Markdown

[He and Todorovic. "Attention Decomposition for Cross-Domain Semantic Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/he2024eccv-attention/) doi:10.1007/978-3-031-72630-9_24

BibTeX

@inproceedings{he2024eccv-attention,
  title     = {{Attention Decomposition for Cross-Domain Semantic Segmentation}},
  author    = {He, Liqiang and Todorovic, Sinisa},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72630-9_24},
  url       = {https://mlanthology.org/eccv/2024/he2024eccv-attention/}
}