Action Recognition with Actons

Abstract

With the improved accessibility to an exploding amount of video data and growing demands in a wide range of video analysis applications, video-based action recognition/classification becomes an increasingly important task in computer vision. In this paper, we propose a two-layer structure for action recognition to automatically exploit a mid-level "acton" representation. The weakly-supervised actons are learned via a new max-margin multi-channel multiple instance learning framework, which can capture multiple mid-level action concepts simultaneously. The learned actons (with no requirement for detailed manual annotations) observe the properties of being compact, informative, discriminative, and easy to scale. The experimental results demonstrate the effectiveness of applying the learned actons in our two-layer structure, and show the state-ofthe-art recognition performance on two challenging action datasets, i.e., Youtube and HMDB51.

Cite

Text

Zhu et al. "Action Recognition with Actons." International Conference on Computer Vision, 2013. doi:10.1109/ICCV.2013.442

Markdown

[Zhu et al. "Action Recognition with Actons." International Conference on Computer Vision, 2013.](https://mlanthology.org/iccv/2013/zhu2013iccv-action/) doi:10.1109/ICCV.2013.442

BibTeX

@inproceedings{zhu2013iccv-action,
  title     = {{Action Recognition with Actons}},
  author    = {Zhu, Jun and Wang, Baoyuan and Yang, Xiaokang and Zhang, Wenjun and Tu, Zhuowen},
  booktitle = {International Conference on Computer Vision},
  year      = {2013},
  doi       = {10.1109/ICCV.2013.442},
  url       = {https://mlanthology.org/iccv/2013/zhu2013iccv-action/}
}