Automatic Segmentation of Sign Language into Subtitle-Units

Abstract

We present baseline results for a new task of automatic segmentation of Sign Language video into sentence-like units. We use a corpus of natural Sign Language video with accurately aligned subtitles to train a spatio-temporal graph convolutional network with a BiLSTM on 2D skeleton data to automatically detect the temporal boundaries of subtitles. In doing so, we segment Sign Language video into subtitle-units that can be translated into phrases in a written language. We achieve a ROC-AUC statistic of 0.87 at the frame level and 92% label accuracy within a time margin of 0.6s of the true labels.

Cite

Text

Bull et al. "Automatic Segmentation of Sign Language into Subtitle-Units." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-66096-3_14

Markdown

[Bull et al. "Automatic Segmentation of Sign Language into Subtitle-Units." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/bull2020eccvw-automatic/) doi:10.1007/978-3-030-66096-3_14

BibTeX

@inproceedings{bull2020eccvw-automatic,
  title     = {{Automatic Segmentation of Sign Language into Subtitle-Units}},
  author    = {Bull, Hannah and Gouiffès, Michèle and Braffort, Annelies},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2020},
  pages     = {186-198},
  doi       = {10.1007/978-3-030-66096-3_14},
  url       = {https://mlanthology.org/eccvw/2020/bull2020eccvw-automatic/}
}