Guided Slot Attention for Unsupervised Video Object Segmentation

Abstract

Unsupervised video object segmentation aims to segment the most prominent object in a video sequence. However the existence of complex backgrounds and multiple foreground objects make this task challenging. To address this issue we propose a guided slot attention network to reinforce spatial structural information and obtain better foreground-background separation. The foreground and background slots which are initialized with query guidance are iteratively refined based on interactions with template information. Furthermore to improve slot-template interaction and effectively fuse global and local features in the target and reference frames K-nearest neighbors filtering and a feature aggregation transformer are introduced. The proposed model achieves state-of-the-art performance on two popular datasets. Additionally we demonstrate the robustness of the proposed model in challenging scenes through various comparative experiments.

Cite

Text

Lee et al. "Guided Slot Attention for Unsupervised Video Object Segmentation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00365

Markdown

[Lee et al. "Guided Slot Attention for Unsupervised Video Object Segmentation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/lee2024cvpr-guided/) doi:10.1109/CVPR52733.2024.00365

BibTeX

@inproceedings{lee2024cvpr-guided,
  title     = {{Guided Slot Attention for Unsupervised Video Object Segmentation}},
  author    = {Lee, Minhyeok and Cho, Suhwan and Lee, Dogyoon and Park, Chaewon and Lee, Jungho and Lee, Sangyoun},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {3807-3816},
  doi       = {10.1109/CVPR52733.2024.00365},
  url       = {https://mlanthology.org/cvpr/2024/lee2024cvpr-guided/}
}