Multi-Modal Sign Language Spotting by Multi/One-Shot Learning

Abstract

The sign spotting task aims to identify whether and where an isolated sign of interest exists in a continuous sign language video. Recently, it has received substantial attention since it is a promising tool to annotate large-scale sign language data. Previous methods utilized multiple sources of available supervision information to localize the sign actions under the RGB domain. However, these methods overlook the complementary nature of different modalities, i.e., RGB, optical flow, and pose, which are beneficial to the sign spotting task. To this end, we propose a framework to merge multiple modalities for multiple-shot supervised learning. Furthermore, we explore the sign spotting task with the one-shot setting, which needs fewer annotations and has broader applications. To evaluate our approach, we participated in the Sign Spotting Challenge, organized by ECCV 2022. The competition contains two tracks, i.e., multiple-shot supervised learning (MSSL for track 1) and one-shot learning with weak labels (OSLWL for track 2). In track 1, our method achieves around 0.566 F1-score and is ranked 2nd. In track 2, we are ranked the 1st, with a 0.6 F1-score. These results demonstrate the effectiveness of our proposed method. We hope our solution will provide some insight for future research in the community.

Cite

Text

Liu et al. "Multi-Modal Sign Language Spotting by Multi/One-Shot Learning." European Conference on Computer Vision Workshops, 2022. doi:10.1007/978-3-031-25085-9_15

Markdown

[Liu et al. "Multi-Modal Sign Language Spotting by Multi/One-Shot Learning." European Conference on Computer Vision Workshops, 2022.](https://mlanthology.org/eccvw/2022/liu2022eccvw-multimodal/) doi:10.1007/978-3-031-25085-9_15

BibTeX

@inproceedings{liu2022eccvw-multimodal,
  title     = {{Multi-Modal Sign Language Spotting by Multi/One-Shot Learning}},
  author    = {Liu, Landong and Zhou, Wengang and Zhao, Weichao and Hu, Hezhen and Li, Houqiang},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2022},
  pages     = {256-270},
  doi       = {10.1007/978-3-031-25085-9_15},
  url       = {https://mlanthology.org/eccvw/2022/liu2022eccvw-multimodal/}
}