Sign Language Translation from Instructional Videos

Abstract

The advances in automatic sign language translation (SLT) to spoken languages have been mostly benchmarked with datasets of limited size and restricted domains. Our work advances the state of the art by providing the first baseline results on How2Sign, a large and broad dataset.We train a Transformer over I3D video features, using the reduced BLEU as a reference metric for validation, instead of the widely used BLEU score. We report a result of 8.03 on the BLEU score, and publish the first open-source implementation of its kind to promote further advances.

Cite

Text

Tarrés et al. "Sign Language Translation from Instructional Videos." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00596

Markdown

[Tarrés et al. "Sign Language Translation from Instructional Videos." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/tarres2023cvprw-sign/) doi:10.1109/CVPRW59228.2023.00596

BibTeX

@inproceedings{tarres2023cvprw-sign,
  title     = {{Sign Language Translation from Instructional Videos}},
  author    = {Tarrés, Laia and Gállego, Gerard I. and Duarte, Amanda Cardoso and Torres, Jordi and Giró-i-Nieto, Xavier},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {5625-5635},
  doi       = {10.1109/CVPRW59228.2023.00596},
  url       = {https://mlanthology.org/cvprw/2023/tarres2023cvprw-sign/}
}