Gesture and Sign Language Recognition with Temporal Residual Networks
Abstract
Gesture and sign language recognition in a continuous video stream is a challenging task, especially with a large vocabulary. In this work, we approach this as a framewise classification problem. We tackle it using temporal convolutions and recent advances in the deep learning field like residual networks, batch normalization and exponential linear units (ELUs). The models are evaluated on three different datasets: the Dutch Sign Language Corpus (Corpus NGT), the Flemish Sign Language Corpus (Corpus VGT) and the ChaLearn LAP RGB-D Continuous Gesture Dataset (ConGD). We achieve a 73.5% top-10 accuracy for 100 signs with the Corpus NGT, 56.4% with the Corpus VGT and a mean Jaccard index of 0.316 with the ChaLearn LAP ConGD without the usage of depth maps.
Cite
Text
Pigou et al. "Gesture and Sign Language Recognition with Temporal Residual Networks." IEEE/CVF International Conference on Computer Vision Workshops, 2017. doi:10.1109/ICCVW.2017.365Markdown
[Pigou et al. "Gesture and Sign Language Recognition with Temporal Residual Networks." IEEE/CVF International Conference on Computer Vision Workshops, 2017.](https://mlanthology.org/iccvw/2017/pigou2017iccvw-gesture/) doi:10.1109/ICCVW.2017.365BibTeX
@inproceedings{pigou2017iccvw-gesture,
title = {{Gesture and Sign Language Recognition with Temporal Residual Networks}},
author = {Pigou, Lionel and Van Herreweghe, Mieke and Dambre, Joni},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2017},
pages = {3086-3093},
doi = {10.1109/ICCVW.2017.365},
url = {https://mlanthology.org/iccvw/2017/pigou2017iccvw-gesture/}
}