Efficient Image and Video Co-Localization with Frank-Wolfe Algorithm

Abstract

In this paper, we tackle the problem of performing efficient co-localization in images and videos. Co-localization is the problem of simultaneously localizing (with bounding boxes) objects of the same class across a set of distinct images or videos. Building upon recent state-of-the-art methods, we show how we are able to naturally incorporate temporal terms and constraints for video co-localization into a quadratic programming framework. Furthermore, by leveraging the Frank-Wolfe algorithm (or conditional gradient), we show how our optimization formulations for both images and videos can be reduced to solving a succession of simple integer programs, leading to increased efficiency in both memory and speed. To validate our method, we present experimental results on the PASCAL VOC 2007 dataset for images and the YouTube-Objects dataset for videos, as well as a joint combination of the two.

Cite

Text

Joulin et al. "Efficient Image and Video Co-Localization with Frank-Wolfe Algorithm." European Conference on Computer Vision, 2014. doi:10.1007/978-3-319-10599-4_17

Markdown

[Joulin et al. "Efficient Image and Video Co-Localization with Frank-Wolfe Algorithm." European Conference on Computer Vision, 2014.](https://mlanthology.org/eccv/2014/joulin2014eccv-efficient/) doi:10.1007/978-3-319-10599-4_17

BibTeX

@inproceedings{joulin2014eccv-efficient,
  title     = {{Efficient Image and Video Co-Localization with Frank-Wolfe Algorithm}},
  author    = {Joulin, Armand and Tang, Kevin D. and Fei-Fei, Li},
  booktitle = {European Conference on Computer Vision},
  year      = {2014},
  pages     = {253-268},
  doi       = {10.1007/978-3-319-10599-4_17},
  url       = {https://mlanthology.org/eccv/2014/joulin2014eccv-efficient/}
}