Putting the Object Back into Video Object Segmentation

Abstract

We present Cutie a video object segmentation (VOS) network with object-level memory reading which puts the object representation from memory back into the video object segmentation result. Recent works on VOS employ bottom-up pixel-level memory reading which struggles due to matching noise especially in the presence of distractors resulting in lower performance in more challenging data. In contrast Cutie performs top-down object-level memory reading by adapting a small set of object queries. Via those it interacts with the bottom-up pixel features iteratively with a query-based object transformer (qt hence Cutie). The object queries act as a high-level summary of the target object while high-resolution feature maps are retained for accurate segmentation. Together with foreground-background masked attention Cutie cleanly separates the semantics of the foreground object from the background. On the challenging MOSE dataset Cutie improves by 8.7 J&F over XMem with a similar running time and improves by 4.2 J&F over DeAOT while being three times faster. Code is available at: hkchengrex.github.io/Cutie

Cite

Text

Cheng et al. "Putting the Object Back into Video Object Segmentation." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00304

Markdown

[Cheng et al. "Putting the Object Back into Video Object Segmentation." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/cheng2024cvpr-putting/) doi:10.1109/CVPR52733.2024.00304

BibTeX

@inproceedings{cheng2024cvpr-putting,
  title     = {{Putting the Object Back into Video Object Segmentation}},
  author    = {Cheng, Ho Kei and Oh, Seoung Wug and Price, Brian and Lee, Joon-Young and Schwing, Alexander},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {3151-3161},
  doi       = {10.1109/CVPR52733.2024.00304},
  url       = {https://mlanthology.org/cvpr/2024/cheng2024cvpr-putting/}
}