Recognizing Skeleton-Based Hand Gestures by a Spatio-Temporal Network

Abstract

A key challenge in skeleton-based hand gesture recognition is the fact that a gesture can often be performed in several different ways, with each consisting of its own configuration of poses and their spatio-temporal dependencies. This leads us to define a spatio-temporal network model that explicitly characterizes these internal configurations of poses and their local spatio-temporal dependencies. The model introduces a latent vector variable from the coordinates embedding to characterize these unique fine-grained configurations among joints of a particular hand gesture. Furthermore, an attention scorer is devised to exchange joint-pose information in the encoder structure, and as a result, all local spatio-temporal dependencies are globally consistent. Empirical evaluations on two benchmark datasets and one in-house dataset suggest our approach significantly outperforms the state-of-the-art methods.

Cite

Text

Li et al. "Recognizing Skeleton-Based Hand Gestures by a Spatio-Temporal Network." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021. doi:10.1007/978-3-030-86514-6_10

Markdown

[Li et al. "Recognizing Skeleton-Based Hand Gestures by a Spatio-Temporal Network." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2021.](https://mlanthology.org/ecmlpkdd/2021/li2021ecmlpkdd-recognizing/) doi:10.1007/978-3-030-86514-6_10

BibTeX

@inproceedings{li2021ecmlpkdd-recognizing,
  title     = {{Recognizing Skeleton-Based Hand Gestures by a Spatio-Temporal Network}},
  author    = {Li, Xin and Liao, Jun and Liu, Li},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2021},
  pages     = {151-167},
  doi       = {10.1007/978-3-030-86514-6_10},
  url       = {https://mlanthology.org/ecmlpkdd/2021/li2021ecmlpkdd-recognizing/}
}