Action and Gesture Temporal Spotting with Super Vector Representation

Abstract

This paper focuses on describing our method designed for both track 2 and track 3 at Looking at People (LAP) challenging [ 1 ]. We propose an action and gesture spotting system, which is mainly composed of three steps: (i) temporal segmentation, (ii) clip classification, and (iii) post processing. For track 2, we resort to a simple sliding window method to divide each video sequence into clips, while for track 3, we design a segmentation method based on the motion analysis of human hands. Then, for each clip, we choose a kind of super vector representation with dense features. Based on this representation, we train a linear SVM to conduct action and gesture recognition. Finally, we use some post processing techniques to void the detection of false positives. We demonstrate the effectiveness of our proposed method by participating the contests of both track 2 and track 3. We obtain the best performance on track 2 and rank $4^{th}$ on track 3, which indicates that the designed system is effective for action and gesture recognition.

Cite

Text

Peng et al. "Action and Gesture Temporal Spotting with Super Vector Representation." European Conference on Computer Vision Workshops, 2014. doi:10.1007/978-3-319-16178-5_36

Markdown

[Peng et al. "Action and Gesture Temporal Spotting with Super Vector Representation." European Conference on Computer Vision Workshops, 2014.](https://mlanthology.org/eccvw/2014/peng2014eccvw-action/) doi:10.1007/978-3-319-16178-5_36

BibTeX

@inproceedings{peng2014eccvw-action,
  title     = {{Action and Gesture Temporal Spotting with Super Vector Representation}},
  author    = {Peng, Xiaojiang and Wang, Limin and Cai, Zhuowei and Qiao, Yu},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2014},
  pages     = {518-527},
  doi       = {10.1007/978-3-319-16178-5_36},
  url       = {https://mlanthology.org/eccvw/2014/peng2014eccvw-action/}
}