Hierarchical I3D for Sign Spotting
Abstract
Most of the vision-based sign language research to date has focused on Isolated Sign Language Recognition (ISLR), where the objective is to predict a single sign class given a short video clip. Although there has been significant progress in ISLR, its real-life applications are limited. In this paper, we focus on the challenging task of Sign Spotting instead, where the goal is to simultaneously identify and localise signs in continuous co-articulated sign videos. To address the limitations of current ISLR-based models, we propose a hierarchical sign spotting approach which learns coarse-to-fine spatio-temporal sign features to take advantage of representations at various temporal levels and provide more precise sign localisation. Specifically, we develop Hierarchical Sign I3D model (HS-I3D) which consists of a hierarchical network head that is attached to the existing spatio-temporal I3D model to exploit features at different layers of the network. We evaluate HS-I3D on the ChaLearn 2022 Sign Spotting Challenge - MSSL track and achieve a state-of-the-art 0.607 F1 score, which was the top-1 winning solution of the competition.
Cite
Text
Wong et al. "Hierarchical I3D for Sign Spotting." European Conference on Computer Vision Workshops, 2022. doi:10.1007/978-3-031-25085-9_14Markdown
[Wong et al. "Hierarchical I3D for Sign Spotting." European Conference on Computer Vision Workshops, 2022.](https://mlanthology.org/eccvw/2022/wong2022eccvw-hierarchical/) doi:10.1007/978-3-031-25085-9_14BibTeX
@inproceedings{wong2022eccvw-hierarchical,
title = {{Hierarchical I3D for Sign Spotting}},
author = {Wong, Ryan and Camgöz, Necati Cihan and Bowden, Richard},
booktitle = {European Conference on Computer Vision Workshops},
year = {2022},
pages = {243-255},
doi = {10.1007/978-3-031-25085-9_14},
url = {https://mlanthology.org/eccvw/2022/wong2022eccvw-hierarchical/}
}