Mutual Support of Data Modalities in the Task of Sign Language Recognition

Abstract

This paper presents a method for automatic sign language recognition that was utilized in the CVPR 2021 ChaLearn Challenge (RGB track). Our method is composed of several approaches combined in an ensemble scheme to perform isolated sign-gesture recognition. We combine modalities of video sample frames processed by a 3D ConvNet (I3D), with body-pose information in the form of joint locations processed by a Transformer, hand region images transformed into a semantic space, and linguistically defined locations of hands. Although the individual models perform sub-par (60% to 93% accuracy on validation data), the weighted ensemble results in 95.46% accuracy.

Cite

Text

Gruber et al. "Mutual Support of Data Modalities in the Task of Sign Language Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021. doi:10.1109/CVPRW53098.2021.00381

Markdown

[Gruber et al. "Mutual Support of Data Modalities in the Task of Sign Language Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021.](https://mlanthology.org/cvprw/2021/gruber2021cvprw-mutual/) doi:10.1109/CVPRW53098.2021.00381

BibTeX

@inproceedings{gruber2021cvprw-mutual,
  title     = {{Mutual Support of Data Modalities in the Task of Sign Language Recognition}},
  author    = {Gruber, Ivan and Krnoul, Zdenek and Hrúz, Marek and Kanis, Jakub and Bohacek, Matyas},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2021},
  pages     = {3424-3433},
  doi       = {10.1109/CVPRW53098.2021.00381},
  url       = {https://mlanthology.org/cvprw/2021/gruber2021cvprw-mutual/}
}