Object Level Grouping for Video Shots

Sivic, Josef; Schaffalitzky, Frederik; Zisserman, Andrew

doi:10.1007/978-3-540-24671-8_7

Object Level Grouping for Video Shots

Josef Sivic, Frederik Schaffalitzky, Andrew Zisserman

ECCV 2004 pp. 85-98

doi:10.1007/978-3-540-24671-8_7 /eccv/2004/sivic2004eccv-object/

Abstract

We describe a method for automatically associating image patches from frames of a movie shot into object-level groups. The method employs both the appearance and motion of the patches. There are two areas of innovation: first, affine invariant regions are used to repair short gaps in individual tracks and also to join sets of tracks across occlusions (where many tracks are lost simultaneously); second, a robust affine factorization method is developed which is able to cope with motion degeneracy. This factorization is used to associate tracks into object-level groups. The outcome is that separate parts of an object that are never visible simultaneously in a single frame are associated together. For example, the front and back of a car, or the front and side of a face. In turn this enables object-level matching and recognition throughout a video. We illustrate the method for a number of shots from the feature film ‘Groundhog Day’.

PDF ECCV Semantic Scholar

Cite

Text

Sivic et al. "Object Level Grouping for Video Shots." European Conference on Computer Vision, 2004. doi:10.1007/978-3-540-24671-8_7

Markdown

[Sivic et al. "Object Level Grouping for Video Shots." European Conference on Computer Vision, 2004.](https://mlanthology.org/eccv/2004/sivic2004eccv-object/) doi:10.1007/978-3-540-24671-8_7

BibTeX

@inproceedings{sivic2004eccv-object,
  title     = {{Object Level Grouping for Video Shots}},
  author    = {Sivic, Josef and Schaffalitzky, Frederik and Zisserman, Andrew},
  booktitle = {European Conference on Computer Vision},
  year      = {2004},
  pages     = {85-98},
  doi       = {10.1007/978-3-540-24671-8_7},
  url       = {https://mlanthology.org/eccv/2004/sivic2004eccv-object/}
}