A Multi-Scale Boosted Detector for Efficient and Robust Gesture Recognition

Abstract

We present an approach to detecting and recognizing gestures in a stream of multi-modal data. Our approach combines a sliding-window gesture detector with features drawn from skeleton data, color imagery, and depth data produced by a first-generation Kinect sensor. The detector consists of a set of one-versus-all boosted classifiers, each tuned to a specific gesture. Features are extracted at multiple temporal scales, and include descriptive statistics of normalized skeleton joint positions, angles, and velocities, as well as image-based hand descriptors. The full set of gesture detectors may be trained in under two hours on a single machine, and is extremely efficient at runtime, operating at 1700fps using only skeletal data, or at 100fps using fused skeleton and image features. Our method achieved a Jaccard Index score of 0.834 on the ChaLearn-2014 Gesture Recognition Test dataset, and was ranked 2nd overall in the competition.

Cite

Text

Monnier et al. "A Multi-Scale Boosted Detector for Efficient and Robust Gesture Recognition." European Conference on Computer Vision Workshops, 2014. doi:10.1007/978-3-319-16178-5_34

Markdown

[Monnier et al. "A Multi-Scale Boosted Detector for Efficient and Robust Gesture Recognition." European Conference on Computer Vision Workshops, 2014.](https://mlanthology.org/eccvw/2014/monnier2014eccvw-multiscale/) doi:10.1007/978-3-319-16178-5_34

BibTeX

@inproceedings{monnier2014eccvw-multiscale,
  title     = {{A Multi-Scale Boosted Detector for Efficient and Robust Gesture Recognition}},
  author    = {Monnier, Camille and German, Stan and Ost, Andrey},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2014},
  pages     = {491-502},
  doi       = {10.1007/978-3-319-16178-5_34},
  url       = {https://mlanthology.org/eccvw/2014/monnier2014eccvw-multiscale/}
}