Multi-Scale Deep Learning for Gesture Detection and Localization

Abstract

We present a method for gesture detection and localization based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at two temporal scales. Key to our technique is a training strategy which exploits i) careful initialization of individual modalities; and ii) gradual fusion of modalities from strongest to weakest cross-modality structure. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams.

Cite

Text

Neverova et al. "Multi-Scale Deep Learning for Gesture Detection and Localization." European Conference on Computer Vision Workshops, 2014. doi:10.1007/978-3-319-16178-5_33

Markdown

[Neverova et al. "Multi-Scale Deep Learning for Gesture Detection and Localization." European Conference on Computer Vision Workshops, 2014.](https://mlanthology.org/eccvw/2014/neverova2014eccvw-multiscale/) doi:10.1007/978-3-319-16178-5_33

BibTeX

@inproceedings{neverova2014eccvw-multiscale,
  title     = {{Multi-Scale Deep Learning for Gesture Detection and Localization}},
  author    = {Neverova, Natalia and Wolf, Christian and Taylor, Graham W. and Nebout, Florian},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2014},
  pages     = {474-490},
  doi       = {10.1007/978-3-319-16178-5_33},
  url       = {https://mlanthology.org/eccvw/2014/neverova2014eccvw-multiscale/}
}