Large-Scale Web Video Event Classification by Use of Fisher Vectors

Abstract

Event recognition has been an important topic in computer vision research due to its many applications. However, most of the work has focused on videos taken from a fixed camera, known environments and basic events. Here, we focus on classification of unconstrained, web videos into much higher level activities. We follow the approach of constructing fixed length feature vectors from local feature descriptors for classification using an SVM. Our key contribution is the study of the utility of Fisher Vector representation in improving results compared to the conventional Bag-of-Words (BoW) approach. Such coding has shown to be useful for static image classification in the past but not applied to video categorization. We perform tests on the challenging NIST TRECVID Multimedia Event Detection (MED) dataset, which has thousand hours of unconstrained user generated videos; our approach achieves as much as 35% improvement over the BoW baseline. We also offer an analysis of possible causes of such improvements.

Cite

Text

Sun and Nevatia. "Large-Scale Web Video Event Classification by Use of Fisher Vectors." IEEE/CVF Winter Conference on Applications of Computer Vision, 2013. doi:10.1109/WACV.2013.6474994

Markdown

[Sun and Nevatia. "Large-Scale Web Video Event Classification by Use of Fisher Vectors." IEEE/CVF Winter Conference on Applications of Computer Vision, 2013.](https://mlanthology.org/wacv/2013/sun2013wacv-large/) doi:10.1109/WACV.2013.6474994

BibTeX

@inproceedings{sun2013wacv-large,
  title     = {{Large-Scale Web Video Event Classification by Use of Fisher Vectors}},
  author    = {Sun, Chen and Nevatia, Ram},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2013},
  pages     = {15-22},
  doi       = {10.1109/WACV.2013.6474994},
  url       = {https://mlanthology.org/wacv/2013/sun2013wacv-large/}
}