TCNet: Continuous Sign Language Recognition from Trajectories and Correlated Regions

AAAI 2024 pp. 3891-3899

doi:10.1609/AAAI.V38I4.28181 /aaai/2024/lu2024aaai-tcnet/

Abstract

A key challenge in continuous sign language recognition (CSLR) is to efficiently capture long-range spatial interactions over time from the video input. To address this challenge, we propose TCNet, a hybrid network that effectively models spatio-temporal information from Trajectories and Correlated regions. TCNet's trajectory module transforms frames into aligned trajectories composed of continuous visual tokens. This facilitates extracting region trajectory patterns. In addition, for a query token, self-attention is learned along the trajectory. As such, our network can also focus on fine-grained spatio-temporal patterns, such as finger movement, of a region in motion. TCNet's correlation module utilizes a novel dynamic attention mechanism that filters out irrelevant frame regions. Additionally, it assigns dynamic key-value tokens from correlated regions to each query. Both innovations significantly reduce the computation cost and memory. We perform experiments on four large-scale datasets: PHOENIX14, PHOENIX14-T, CSL, and CSL-Daily. Our results demonstrate that TCNet consistently achieves state-of-the-art performance. For example, we improve over the previous state-of-the-art by 1.5\% and 1.0\% word error rate on PHOENIX14 and PHOENIX14-T, respectively. Code is available at https://github.com/hotfinda/TCNet

PDF AAAI Semantic Scholar

Cite

Text

Lu et al. "TCNet: Continuous Sign Language Recognition from Trajectories and Correlated Regions." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I4.28181

Markdown

[Lu et al. "TCNet: Continuous Sign Language Recognition from Trajectories and Correlated Regions." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/lu2024aaai-tcnet/) doi:10.1609/AAAI.V38I4.28181

BibTeX

@inproceedings{lu2024aaai-tcnet,
  title     = {{TCNet: Continuous Sign Language Recognition from Trajectories and Correlated Regions}},
  author    = {Lu, Hui and Salah, Albert Ali and Poppe, Ronald},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {3891-3899},
  doi       = {10.1609/AAAI.V38I4.28181},
  url       = {https://mlanthology.org/aaai/2024/lu2024aaai-tcnet/}
}