Cross-Granularity Graph Inference for Semantic Video Object Segmentation
Abstract
We address semantic video object segmentation via a novel cross-granularity hierarchical graphical model to integrate tracklet and object proposal reasoning with superpixel labeling. Tracklet characterizes varying spatial-temporal relations of video object which, however, quite often suffers from sporadic local outliers. In order to acquire high-quality tracklets, we propose a transductive inference model which is capable of calibrating short-range noisy object tracklets with respect to long-range dependencies and high-level context cues. In the center of this work lies a new paradigm of semantic video object segmentation beyond modeling appearance and motion of objects locally, where the semantic label is inferred by jointly exploiting multi-scale contextual information and spatial-temporal relations of video object. We evaluate our method on two popular semantic video object segmentation benchmarks and demonstrate that it advances the state-of-the-art by achieving superior accuracy performance than other leading methods.
Cite
Text
Wang et al. "Cross-Granularity Graph Inference for Semantic Video Object Segmentation." International Joint Conference on Artificial Intelligence, 2017. doi:10.24963/IJCAI.2017/634Markdown
[Wang et al. "Cross-Granularity Graph Inference for Semantic Video Object Segmentation." International Joint Conference on Artificial Intelligence, 2017.](https://mlanthology.org/ijcai/2017/wang2017ijcai-cross/) doi:10.24963/IJCAI.2017/634BibTeX
@inproceedings{wang2017ijcai-cross,
title = {{Cross-Granularity Graph Inference for Semantic Video Object Segmentation}},
author = {Wang, Huiling and Wang, Tinghuai and Chen, Ke and Kämäräinen, Joni-Kristian},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2017},
pages = {4544-4550},
doi = {10.24963/IJCAI.2017/634},
url = {https://mlanthology.org/ijcai/2017/wang2017ijcai-cross/}
}