Segmentation Free Object Discovery in Video

Abstract

In this paper we present a simple yet effective approach to extend without supervision any object proposal from static images to videos. Unlike previous methods, these spatio-temporal proposals, to which we refer as “tracks”, are generated relying on little or no visual content by only exploiting bounding boxes spatial correlations through time. The tracks that we obtain are likely to represent objects and are a general-purpose tool to represent meaningful video content for a wide variety of tasks. For unannotated videos, tracks can be used to discover content without any supervision. As further contribution we also propose a novel and dataset-independent method to evaluate a generic object proposal based on the entropy of a classifier output response. We experiment on two competitive datasets, namely YouTube Objects [6] and ILSVRC-2015 VID [7].

Cite

Text

Cuffaro et al. "Segmentation Free Object Discovery in Video." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-49409-8_4

Markdown

[Cuffaro et al. "Segmentation Free Object Discovery in Video." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/cuffaro2016eccv-segmentation/) doi:10.1007/978-3-319-49409-8_4

BibTeX

@inproceedings{cuffaro2016eccv-segmentation,
  title     = {{Segmentation Free Object Discovery in Video}},
  author    = {Cuffaro, Giovanni and Becattini, Federico and Baecchi, Claudio and Seidenari, Lorenzo and Del Bimbo, Alberto},
  booktitle = {European Conference on Computer Vision},
  year      = {2016},
  pages     = {25-31},
  doi       = {10.1007/978-3-319-49409-8_4},
  url       = {https://mlanthology.org/eccv/2016/cuffaro2016eccv-segmentation/}
}