Automatic Segmentation of Sign Language into Subtitle-Units
Abstract
We present baseline results for a new task of automatic segmentation of Sign Language video into sentence-like units. We use a corpus of natural Sign Language video with accurately aligned subtitles to train a spatio-temporal graph convolutional network with a BiLSTM on 2D skeleton data to automatically detect the temporal boundaries of subtitles. In doing so, we segment Sign Language video into subtitle-units that can be translated into phrases in a written language. We achieve a ROC-AUC statistic of 0.87 at the frame level and 92% label accuracy within a time margin of 0.6s of the true labels.
Cite
Text
Bull et al. "Automatic Segmentation of Sign Language into Subtitle-Units." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-66096-3_14Markdown
[Bull et al. "Automatic Segmentation of Sign Language into Subtitle-Units." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/bull2020eccvw-automatic/) doi:10.1007/978-3-030-66096-3_14BibTeX
@inproceedings{bull2020eccvw-automatic,
title = {{Automatic Segmentation of Sign Language into Subtitle-Units}},
author = {Bull, Hannah and Gouiffès, Michèle and Braffort, Annelies},
booktitle = {European Conference on Computer Vision Workshops},
year = {2020},
pages = {186-198},
doi = {10.1007/978-3-030-66096-3_14},
url = {https://mlanthology.org/eccvw/2020/bull2020eccvw-automatic/}
}