ReMamber: Referring Image Segmentation with Mamba Twister
Abstract
Referring Image Segmentation (RIS) leveraging transformers has achieved great success on the interpretation of complex visual-language tasks. However, the quadratic computation cost makes it resource-consuming in capturing long-range visual-language dependencies. Fortunately, Mamba addresses this with efficient linear complexity in processing. However, directly applying Mamba to multi-modal interactions presents challenges, primarily due to inadequate channel interactions for the effective fusion of multi-modal data. In this paper, we propose , a novel RIS architecture that integrates the power of Mamba with a multi-modal Mamba Twister block. The Mamba Twister explicitly models image-text interaction, and fuses textual and visual features through its unique channel and spatial twisting mechanism. We achieve competitive results on three challenging benchmarks with a simple and efficient architecture. Moreover, we conduct thorough analyses of and discuss other fusion designs using Mamba. These provide valuable perspectives for future research. The code has been released at: https:// github.com/yyh-rain-song/ReMamber.
Cite
Text
Yang et al. "ReMamber: Referring Image Segmentation with Mamba Twister." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72684-2_7Markdown
[Yang et al. "ReMamber: Referring Image Segmentation with Mamba Twister." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/yang2024eccv-remamber/) doi:10.1007/978-3-031-72684-2_7BibTeX
@inproceedings{yang2024eccv-remamber,
title = {{ReMamber: Referring Image Segmentation with Mamba Twister}},
author = {Yang, Yuhuan and Ma, Chaofan and Yao, Jiangchao and Zhong, Zhun and Zhang, Ya and Wang, Yanfeng},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72684-2_7},
url = {https://mlanthology.org/eccv/2024/yang2024eccv-remamber/}
}