Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals

Abstract

We present an unsupervised approach that generates a diverse, ranked set of bounding box and segmentation video object proposals---spatio-temporal tubes that localize the foreground objects---in an unannotated video. In contrast to previous unsupervised methods that either track regions initialized in an arbitrary frame or train a fixed model over a cluster of regions, we instead discover a set of easy-to-group instances of an object and then iteratively update its appearance model to gradually detect harder instances in temporally-adjacent frames. Our method first generates a set of spatio-temporal bounding box proposals, and then refines them to obtain pixel-wise segmentation proposals. Through extensive experiments, we demonstrate state-of-the-art segmentation results on the SegTrack v2 dataset, and bounding box tracking results that perform competitively to state-of-the-art supervised tracking methods.

Cite

Text

Xiao and Lee. "Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals." Conference on Computer Vision and Pattern Recognition, 2016. doi:10.1109/CVPR.2016.107

Markdown

[Xiao and Lee. "Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals." Conference on Computer Vision and Pattern Recognition, 2016.](https://mlanthology.org/cvpr/2016/xiao2016cvpr-track/) doi:10.1109/CVPR.2016.107

BibTeX

@inproceedings{xiao2016cvpr-track,
  title     = {{Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals}},
  author    = {Xiao, Fanyi and Lee, Yong Jae},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2016},
  doi       = {10.1109/CVPR.2016.107},
  url       = {https://mlanthology.org/cvpr/2016/xiao2016cvpr-track/}
}