Weakly Supervised Learning of Object Segmentations from Web-Scale Video

Hartmann, Glenn; Grundmann, Matthias; Hoffman, Judy; Tsai, David; Kwatra, Vivek; Madani, Omid; Vijayanarasimhan, Sudheendra; Essa, Irfan A.; Rehg, James M.; Sukthankar, Rahul

doi:10.1007/978-3-642-33863-2_20

Weakly Supervised Learning of Object Segmentations from Web-Scale Video

Glenn Hartmann, Matthias Grundmann, Judy Hoffman, David Tsai, Vivek Kwatra, Omid Madani, Sudheendra Vijayanarasimhan, Irfan A. Essa, James M. Rehg, Rahul Sukthankar

ECCVW 2012 pp. 198-208

doi:10.1007/978-3-642-33863-2_20 /eccvw/2012/hartmann2012eccvw-weakly/

Abstract

We propose to learn pixel-level segmentations of objects from weakly labeled (tagged) internet videos. Specifically, given a large collection of raw YouTube content, along with potentially noisy tags, our goal is to automatically generate spatiotemporal masks for each object, such as “dog”, without employing any pre-trained object detectors. We formulate this problem as learning weakly supervised classifiers for a set of independent spatio-temporal segments. The object seeds obtained using segment-level classifiers are further refined using graphcuts to generate high-precision object masks. Our results, obtained by training on a dataset of 20,000 YouTube videos weakly tagged into 15 classes, demonstrate automatic extraction of pixel-level object masks. Evaluated against a ground-truthed subset of 50,000 frames with pixel-level annotations, we confirm that our proposed methods can learn good object masks just by watching YouTube.

PDF ECCVW Semantic Scholar

Cite

Text

Hartmann et al. "Weakly Supervised Learning of Object Segmentations from Web-Scale Video." European Conference on Computer Vision Workshops, 2012. doi:10.1007/978-3-642-33863-2_20

Markdown

[Hartmann et al. "Weakly Supervised Learning of Object Segmentations from Web-Scale Video." European Conference on Computer Vision Workshops, 2012.](https://mlanthology.org/eccvw/2012/hartmann2012eccvw-weakly/) doi:10.1007/978-3-642-33863-2_20

BibTeX

@inproceedings{hartmann2012eccvw-weakly,
  title     = {{Weakly Supervised Learning of Object Segmentations from Web-Scale Video}},
  author    = {Hartmann, Glenn and Grundmann, Matthias and Hoffman, Judy and Tsai, David and Kwatra, Vivek and Madani, Omid and Vijayanarasimhan, Sudheendra and Essa, Irfan A. and Rehg, James M. and Sukthankar, Rahul},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2012},
  pages     = {198-208},
  doi       = {10.1007/978-3-642-33863-2_20},
  url       = {https://mlanthology.org/eccvw/2012/hartmann2012eccvw-weakly/}
}