Attention Decomposition for Cross-Domain Semantic Segmentation
Abstract
This work addresses cross-domain semantic segmentation. While recent CNNs and proposal-free transformers led to significant advances, we introduce a new transformer with a lighter encoder and more complex decoder with query tokens for predicting segmentation masks, called . The domain gap between the source and target domains is reduced with two mechanisms. First, we decompose cross-attention in the decoder into domain-independent and domain-specific parts to enforce the query tokens interact with the domain-independent aspects of the image tokens, shared by the source and target domains, rather than domain-specific counterparts which induce the domain gap. Second, we use the gradient reverse block to control back-propagation of the gradient, and hence introduce adversarial learning in the decoder of . Our results on two benchmark domain shifts – GTA to Cityscapes and SYNTHIA to Cityscapes – show that outperforms SOTA proposal-free methods with significantly lower complexity. The implementation is available at https://github.com/helq2612/ADFormer.
Cite
Text
He and Todorovic. "Attention Decomposition for Cross-Domain Semantic Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72630-9_24Markdown
[He and Todorovic. "Attention Decomposition for Cross-Domain Semantic Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/he2024eccv-attention/) doi:10.1007/978-3-031-72630-9_24BibTeX
@inproceedings{he2024eccv-attention,
title = {{Attention Decomposition for Cross-Domain Semantic Segmentation}},
author = {He, Liqiang and Todorovic, Sinisa},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72630-9_24},
url = {https://mlanthology.org/eccv/2024/he2024eccv-attention/}
}