Discriminative Figure-Centric Models for Joint Action Localization and Recognition

Abstract

In this paper we develop an algorithm for action recognition and localization in videos. The algorithm uses a figure-centric visual word representation. Different from previous approaches it does not require reliable human detection and tracking as input. Instead, the person location is treated as a latent variable that is inferred simultaneously with action recognition. A spatial model for an action is learned in a discriminative fashion under a figure-centric representation. Temporal smoothness over video sequences is also enforced. We present results on the UCF-Sports dataset, verifying the effectiveness of our model in situations where detection and tracking of individuals is challenging.

Cite

Text

Lan et al. "Discriminative Figure-Centric Models for Joint Action Localization and Recognition." IEEE/CVF International Conference on Computer Vision, 2011. doi:10.1109/ICCV.2011.6126472

Markdown

[Lan et al. "Discriminative Figure-Centric Models for Joint Action Localization and Recognition." IEEE/CVF International Conference on Computer Vision, 2011.](https://mlanthology.org/iccv/2011/lan2011iccv-discriminative/) doi:10.1109/ICCV.2011.6126472

BibTeX

@inproceedings{lan2011iccv-discriminative,
  title     = {{Discriminative Figure-Centric Models for Joint Action Localization and Recognition}},
  author    = {Lan, Tian and Wang, Yang and Mori, Greg},
  booktitle = {IEEE/CVF International Conference on Computer Vision},
  year      = {2011},
  pages     = {2003-2010},
  doi       = {10.1109/ICCV.2011.6126472},
  url       = {https://mlanthology.org/iccv/2011/lan2011iccv-discriminative/}
}