Cross-Granularity Graph Inference for Semantic Video Object Segmentation

Abstract

We address semantic video object segmentation via a novel cross-granularity hierarchical graphical model to integrate tracklet and object proposal reasoning with superpixel labeling. Tracklet characterizes varying spatial-temporal relations of video object which, however, quite often suffers from sporadic local outliers. In order to acquire high-quality tracklets, we propose a transductive inference model which is capable of calibrating short-range noisy object tracklets with respect to long-range dependencies and high-level context cues. In the center of this work lies a new paradigm of semantic video object segmentation beyond modeling appearance and motion of objects locally, where the semantic label is inferred by jointly exploiting multi-scale contextual information and spatial-temporal relations of video object. We evaluate our method on two popular semantic video object segmentation benchmarks and demonstrate that it advances the state-of-the-art by achieving superior accuracy performance than other leading methods.

Cite

Text

Wang et al. "Cross-Granularity Graph Inference for Semantic Video Object Segmentation." International Joint Conference on Artificial Intelligence, 2017. doi:10.24963/IJCAI.2017/634

Markdown

[Wang et al. "Cross-Granularity Graph Inference for Semantic Video Object Segmentation." International Joint Conference on Artificial Intelligence, 2017.](https://mlanthology.org/ijcai/2017/wang2017ijcai-cross/) doi:10.24963/IJCAI.2017/634

BibTeX

@inproceedings{wang2017ijcai-cross,
  title     = {{Cross-Granularity Graph Inference for Semantic Video Object Segmentation}},
  author    = {Wang, Huiling and Wang, Tinghuai and Chen, Ke and Kämäräinen, Joni-Kristian},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2017},
  pages     = {4544-4550},
  doi       = {10.24963/IJCAI.2017/634},
  url       = {https://mlanthology.org/ijcai/2017/wang2017ijcai-cross/}
}