Discriminative Figure-Centric Models for Joint Action Localization and Recognition
Abstract
In this paper we develop an algorithm for action recognition and localization in videos. The algorithm uses a figure-centric visual word representation. Different from previous approaches it does not require reliable human detection and tracking as input. Instead, the person location is treated as a latent variable that is inferred simultaneously with action recognition. A spatial model for an action is learned in a discriminative fashion under a figure-centric representation. Temporal smoothness over video sequences is also enforced. We present results on the UCF-Sports dataset, verifying the effectiveness of our model in situations where detection and tracking of individuals is challenging.
Cite
Text
Lan et al. "Discriminative Figure-Centric Models for Joint Action Localization and Recognition." IEEE/CVF International Conference on Computer Vision, 2011. doi:10.1109/ICCV.2011.6126472Markdown
[Lan et al. "Discriminative Figure-Centric Models for Joint Action Localization and Recognition." IEEE/CVF International Conference on Computer Vision, 2011.](https://mlanthology.org/iccv/2011/lan2011iccv-discriminative/) doi:10.1109/ICCV.2011.6126472BibTeX
@inproceedings{lan2011iccv-discriminative,
title = {{Discriminative Figure-Centric Models for Joint Action Localization and Recognition}},
author = {Lan, Tian and Wang, Yang and Mori, Greg},
booktitle = {IEEE/CVF International Conference on Computer Vision},
year = {2011},
pages = {2003-2010},
doi = {10.1109/ICCV.2011.6126472},
url = {https://mlanthology.org/iccv/2011/lan2011iccv-discriminative/}
}