Object Level Grouping for Video Shots

Abstract

We describe a method for automatically associating image patches from frames of a movie shot into object-level groups. The method employs both the appearance and motion of the patches. There are two areas of innovation: first, affine invariant regions are used to repair short gaps in individual tracks and also to join sets of tracks across occlusions (where many tracks are lost simultaneously); second, a robust affine factorization method is developed which is able to cope with motion degeneracy. This factorization is used to associate tracks into object-level groups. The outcome is that separate parts of an object that are never visible simultaneously in a single frame are associated together. For example, the front and back of a car, or the front and side of a face. In turn this enables object-level matching and recognition throughout a video. We illustrate the method for a number of shots from the feature film ‘Groundhog Day’.

Cite

Text

Sivic et al. "Object Level Grouping for Video Shots." European Conference on Computer Vision, 2004. doi:10.1007/978-3-540-24671-8_7

Markdown

[Sivic et al. "Object Level Grouping for Video Shots." European Conference on Computer Vision, 2004.](https://mlanthology.org/eccv/2004/sivic2004eccv-object/) doi:10.1007/978-3-540-24671-8_7

BibTeX

@inproceedings{sivic2004eccv-object,
  title     = {{Object Level Grouping for Video Shots}},
  author    = {Sivic, Josef and Schaffalitzky, Frederik and Zisserman, Andrew},
  booktitle = {European Conference on Computer Vision},
  year      = {2004},
  pages     = {85-98},
  doi       = {10.1007/978-3-540-24671-8_7},
  url       = {https://mlanthology.org/eccv/2004/sivic2004eccv-object/}
}