CATS: Co-Saliency Activated Tracklet Selection for Video Co-Localization
Abstract
Video co-localization is the task of jointly localizing common objects across videos. Due to the appearance variations both across the videos and within the video, it is a challenging problem to identify and track them without any supervision. In contrast to previous joint frameworks that use bounding box proposals to attack the problem, we propose to leverage co-saliency activated tracklets to address the challenge. To identify the common visual object, we first explore inter-video commonness, intra-video commonness, and motion saliency to generate the co-saliency maps. Object proposals of high objectness and co-saliency scores are tracked across short video intervals to build tracklets. The best tube for a video is obtained through tracklet selection from these intervals based on confidence and smoothness between the adjacent tracklets, with the help of dynamic programming. Experimental results on the benchmark YouTube Object dataset show that the proposed method outperforms state-of-the-art methods.
Cite
Text
Jerripothula et al. "CATS: Co-Saliency Activated Tracklet Selection for Video Co-Localization." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-46478-7_12Markdown
[Jerripothula et al. "CATS: Co-Saliency Activated Tracklet Selection for Video Co-Localization." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/jerripothula2016eccv-cats/) doi:10.1007/978-3-319-46478-7_12BibTeX
@inproceedings{jerripothula2016eccv-cats,
title = {{CATS: Co-Saliency Activated Tracklet Selection for Video Co-Localization}},
author = {Jerripothula, Koteswar Rao and Cai, Jianfei and Yuan, Junsong},
booktitle = {European Conference on Computer Vision},
year = {2016},
pages = {187-202},
doi = {10.1007/978-3-319-46478-7_12},
url = {https://mlanthology.org/eccv/2016/jerripothula2016eccv-cats/}
}