A Temporal Bayesian Model for Classifying, Detecting and Localizing Activities in Video Sequences

Abstract

We present an framework to detect and localize activities in unconstrained real-life video sequences. This is a more challenging problem as it subsumes the activity classification problem and also requires us to work with unconstrained videos. To obtain real-life data, we have focused on using the Human Motion Database (HMDB), a collection of realistic video clips. The detection and localization paradigm we introduce uses a keyword model for detecting key activities or gestures in a video sequence. This process is analogous to the use of keyword or key-phrase detection in speech processing. The method learns models for the activities-of-interest during training, so that when presented with a network of activities (a representation of video sequences) at testing, the goal is to detect the keywords in the network. Our approach for classification outperformed all the current state-of-the-art classifiers when tested on two publicly available datasets, KTH and HMDB. We also tested this paradigm for spotting gestures via a one-shot-learning approach on the CHALEARN gesture dataset and obtained very promising results. Our approach was ranked amongst the top-5 best performing techniques in the CHALEARN 2012 gesture spotting competition.

Cite

Text

Malgireddy et al. "A Temporal Bayesian Model for Classifying, Detecting and Localizing Activities in Video Sequences." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2012. doi:10.1109/CVPRW.2012.6239185

Markdown

[Malgireddy et al. "A Temporal Bayesian Model for Classifying, Detecting and Localizing Activities in Video Sequences." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2012.](https://mlanthology.org/cvprw/2012/malgireddy2012cvprw-temporal/) doi:10.1109/CVPRW.2012.6239185

BibTeX

@inproceedings{malgireddy2012cvprw-temporal,
  title     = {{A Temporal Bayesian Model for Classifying, Detecting and Localizing Activities in Video Sequences}},
  author    = {Malgireddy, Manavender R. and Nwogu, Ifeoma and Govindaraju, Venu},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2012},
  pages     = {43-48},
  doi       = {10.1109/CVPRW.2012.6239185},
  url       = {https://mlanthology.org/cvprw/2012/malgireddy2012cvprw-temporal/}
}