Pipelining Localized Semantic Features for Fine-Grained Action Recognition
Abstract
In fine-grained action (object manipulation) recognition, it is important to encode object semantic (contextual) information, i.e., which object is being manipulated and how it is being operated. However, previous methods for action recognition often represent the semantic information in a global and coarse way and therefore cannot cope with fine-grained actions. In this work, we propose a representation and classification pipeline which seamlessly incorporates localized semantic information into every processing step for fine-grained action recognition. In the feature extraction stage, we explore the geometric information between local motion features and the surrounding objects. In the feature encoding stage, we develop a semantic-grouped locality-constrained linear coding (SG-LLC) method that captures the joint distributions between motion and object-in-use information. Finally, we propose a semantic-aware multiple kernel learning framework (SA-MKL) by utilizing the empirical joint distribution between action and object type for more discriminative action classification. Extensive experiments are performed on the large-scale and difficult fine-grained MPII cooking action dataset. The results show that by effectively accumulating localized semantic information into the action representation and classification pipeline, we significantly improve the fine-grained action classification performance over the existing methods.
Cite
Text
Zhou et al. "Pipelining Localized Semantic Features for Fine-Grained Action Recognition." European Conference on Computer Vision, 2014. doi:10.1007/978-3-319-10593-2_32Markdown
[Zhou et al. "Pipelining Localized Semantic Features for Fine-Grained Action Recognition." European Conference on Computer Vision, 2014.](https://mlanthology.org/eccv/2014/zhou2014eccv-pipelining/) doi:10.1007/978-3-319-10593-2_32BibTeX
@inproceedings{zhou2014eccv-pipelining,
title = {{Pipelining Localized Semantic Features for Fine-Grained Action Recognition}},
author = {Zhou, Yang and Ni, Bingbing and Yan, Shuicheng and Moulin, Pierre and Tian, Qi},
booktitle = {European Conference on Computer Vision},
year = {2014},
pages = {481-496},
doi = {10.1007/978-3-319-10593-2_32},
url = {https://mlanthology.org/eccv/2014/zhou2014eccv-pipelining/}
}