Space-Time Robust Representation for Action Recognition

Abstract

We address the problem of action recognition in unconstrained videos. We propose a novel content driven pooling that leverages space-time context while being robust toward global space-time transformations. Being robust to such transformations is of primary importance in unconstrained videos where the action localizations can drastically shift between frames. Our pooling identifies regions of interest using video structural cues estimated by different saliency functions. To combine the different structural information, we introduce an iterative structure learning algorithm, WSVM (weighted SVM), that determines the optimal saliency layout of an action model through a sparse regularizer. A new optimization method is proposed to solve the WSVM' highly non-smooth objective function. We evaluate our approach on standard action datasets (KTH, UCF50 and HMDB). Most noticeably, the accuracy of our algorithm reaches 51.8% on the challenging HMDB dataset which outperforms the state-of-the-art of HHMMDrelatively.

Cite

Text

Ballas et al. "Space-Time Robust Representation for Action Recognition." International Conference on Computer Vision, 2013. doi:10.1109/ICCV.2013.336

Markdown

[Ballas et al. "Space-Time Robust Representation for Action Recognition." International Conference on Computer Vision, 2013.](https://mlanthology.org/iccv/2013/ballas2013iccv-spacetime/) doi:10.1109/ICCV.2013.336

BibTeX

@inproceedings{ballas2013iccv-spacetime,
  title     = {{Space-Time Robust Representation for Action Recognition}},
  author    = {Ballas, Nicolas and Yang, Yi and Lan, Zhen-Zhong and Delezoide, Bertrand and Preteux, Francoise and Hauptmann, Alexander},
  booktitle = {International Conference on Computer Vision},
  year      = {2013},
  doi       = {10.1109/ICCV.2013.336},
  url       = {https://mlanthology.org/iccv/2013/ballas2013iccv-spacetime/}
}