Tracking Anything with Decoupled Video Segmentation

Cheng, Ho Kei; Oh, Seoung Wug; Price, Brian; Schwing, Alexander; Lee, Joon-Young

doi:10.1109/ICCV51070.2023.00127

Tracking Anything with Decoupled Video Segmentation

Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing, Joon-Young Lee

ICCV 2023 pp. 1316-1326

doi:10.1109/ICCV51070.2023.00127 /iccv/2023/cheng2023iccv-tracking/

Abstract

Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video segmentation, referring video segmentation, and unsupervised video object segmentation. Code is available at: https://hkchengrex.github.io/Tracking-Anything-with-DEVA.

PDF ICCV Semantic Scholar

Cite

Text

Cheng et al. "Tracking Anything with Decoupled Video Segmentation." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00127

Markdown

[Cheng et al. "Tracking Anything with Decoupled Video Segmentation." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/cheng2023iccv-tracking/) doi:10.1109/ICCV51070.2023.00127

BibTeX

@inproceedings{cheng2023iccv-tracking,
  title     = {{Tracking Anything with Decoupled Video Segmentation}},
  author    = {Cheng, Ho Kei and Oh, Seoung Wug and Price, Brian and Schwing, Alexander and Lee, Joon-Young},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {1316-1326},
  doi       = {10.1109/ICCV51070.2023.00127},
  url       = {https://mlanthology.org/iccv/2023/cheng2023iccv-tracking/}
}