Video Object Segmentation by Tracking Regions

Abstract

This paper presents an approach to unsupervised segmentation of moving and static objects occurring in a video. Objects are, in general, spatially cohesive and characterized by locally smooth motion trajectories. Therefore, they occupy regions within each frame, while the shape and location of these regions vary slowly from frame to frame. Thus, video segmentation can be done by tracking regions across the frames such that the resulting tracks are locally smooth. To this end, we use a low-level segmentation to extract regions in all frames, and then we transitively match and cluster the similar regions across the video. The similarity is defined with respect to the region photometric, geometric, and motion properties. We formulate a new circular dynamic-time warping (CDTW) algorithm that generalizes DTW to match closed boundaries of two regions, without compromising DTW's guarantees of achieving the optimal solution with linear complexity. Our quantitative evaluation and comparison with the state of the art suggest that the proposed approach is a competitive alternative to currently prevailing point-based methods.

Cite

Text

Brendel and Todorovic. "Video Object Segmentation by Tracking Regions." IEEE/CVF International Conference on Computer Vision, 2009. doi:10.1109/ICCV.2009.5459242

Markdown

[Brendel and Todorovic. "Video Object Segmentation by Tracking Regions." IEEE/CVF International Conference on Computer Vision, 2009.](https://mlanthology.org/iccv/2009/brendel2009iccv-video/) doi:10.1109/ICCV.2009.5459242

BibTeX

@inproceedings{brendel2009iccv-video,
  title     = {{Video Object Segmentation by Tracking Regions}},
  author    = {Brendel, William and Todorovic, Sinisa},
  booktitle = {IEEE/CVF International Conference on Computer Vision},
  year      = {2009},
  pages     = {833-840},
  doi       = {10.1109/ICCV.2009.5459242},
  url       = {https://mlanthology.org/iccv/2009/brendel2009iccv-video/}
}