A Multi-Scale Boosted Detector for Efficient and Robust Gesture Recognition
Abstract
We present an approach to detecting and recognizing gestures in a stream of multi-modal data. Our approach combines a sliding-window gesture detector with features drawn from skeleton data, color imagery, and depth data produced by a first-generation Kinect sensor. The detector consists of a set of one-versus-all boosted classifiers, each tuned to a specific gesture. Features are extracted at multiple temporal scales, and include descriptive statistics of normalized skeleton joint positions, angles, and velocities, as well as image-based hand descriptors. The full set of gesture detectors may be trained in under two hours on a single machine, and is extremely efficient at runtime, operating at 1700fps using only skeletal data, or at 100fps using fused skeleton and image features. Our method achieved a Jaccard Index score of 0.834 on the ChaLearn-2014 Gesture Recognition Test dataset, and was ranked 2nd overall in the competition.
Cite
Text
Monnier et al. "A Multi-Scale Boosted Detector for Efficient and Robust Gesture Recognition." European Conference on Computer Vision Workshops, 2014. doi:10.1007/978-3-319-16178-5_34Markdown
[Monnier et al. "A Multi-Scale Boosted Detector for Efficient and Robust Gesture Recognition." European Conference on Computer Vision Workshops, 2014.](https://mlanthology.org/eccvw/2014/monnier2014eccvw-multiscale/) doi:10.1007/978-3-319-16178-5_34BibTeX
@inproceedings{monnier2014eccvw-multiscale,
title = {{A Multi-Scale Boosted Detector for Efficient and Robust Gesture Recognition}},
author = {Monnier, Camille and German, Stan and Ost, Andrey},
booktitle = {European Conference on Computer Vision Workshops},
year = {2014},
pages = {491-502},
doi = {10.1007/978-3-319-16178-5_34},
url = {https://mlanthology.org/eccvw/2014/monnier2014eccvw-multiscale/}
}