Multi-Scale Deep Learning for Gesture Detection and Localization
Abstract
We present a method for gesture detection and localization based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at two temporal scales. Key to our technique is a training strategy which exploits i) careful initialization of individual modalities; and ii) gradual fusion of modalities from strongest to weakest cross-modality structure. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams.
Cite
Text
Neverova et al. "Multi-Scale Deep Learning for Gesture Detection and Localization." European Conference on Computer Vision Workshops, 2014. doi:10.1007/978-3-319-16178-5_33Markdown
[Neverova et al. "Multi-Scale Deep Learning for Gesture Detection and Localization." European Conference on Computer Vision Workshops, 2014.](https://mlanthology.org/eccvw/2014/neverova2014eccvw-multiscale/) doi:10.1007/978-3-319-16178-5_33BibTeX
@inproceedings{neverova2014eccvw-multiscale,
title = {{Multi-Scale Deep Learning for Gesture Detection and Localization}},
author = {Neverova, Natalia and Wolf, Christian and Taylor, Graham W. and Nebout, Florian},
booktitle = {European Conference on Computer Vision Workshops},
year = {2014},
pages = {474-490},
doi = {10.1007/978-3-319-16178-5_33},
url = {https://mlanthology.org/eccvw/2014/neverova2014eccvw-multiscale/}
}