Semantic Co-Segmentation in Videos

Abstract

Discovering and segmenting objects in videos is a challenging task due to large variations of objects in appearances, deformed shapes and cluttered backgrounds. In this paper, we propose to segment objects and understand their visual semantics from a collection of videos that link to each other, which we refer to as semantic co-segmentation . Without any prior knowledge on videos, we first extract semantic objects and utilize a tracking-based approach to generate multiple object-like tracklets across the video. Each tracklet maintains temporally connected segments and is associated with a predicted category. To exploit rich information from other videos, we collect tracklets that are assigned to the same category from all videos, and co-select tracklets that belong to true objects by solving a submodular function. This function accounts for object properties such as appearances, shapes and motions, and hence facilitates the co-segmentation process. Experiments on three video object segmentation datasets show that the proposed algorithm performs favorably against the other state-of-the-art methods.

Cite

Text

Tsai et al. "Semantic Co-Segmentation in Videos." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-46493-0_46

Markdown

[Tsai et al. "Semantic Co-Segmentation in Videos." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/tsai2016eccv-semantic/) doi:10.1007/978-3-319-46493-0_46

BibTeX

@inproceedings{tsai2016eccv-semantic,
  title     = {{Semantic Co-Segmentation in Videos}},
  author    = {Tsai, Yi-Hsuan and Zhong, Guangyu and Yang, Ming-Hsuan},
  booktitle = {European Conference on Computer Vision},
  year      = {2016},
  pages     = {760-775},
  doi       = {10.1007/978-3-319-46493-0_46},
  url       = {https://mlanthology.org/eccv/2016/tsai2016eccv-semantic/}
}