OO-dMVMT: A Deep Multi-View Multi-Task Classification Framework for Real-Time 3D Hand Gesture Classification and Segmentation

Abstract

Continuous mid-air hand gesture recognition based on captured hand pose streams is fundamental for human-computer interaction, particularly in AR / VR. However, many of the methods proposed to recognize heterogeneous hand gestures are tested only on the classification task, and the real-time low-latency gesture segmentation in a continuous stream is not well addressed in the literature. For this task, we propose the On-Off deep Multi-View Multi-Task paradigm (OO-dMVMT). The idea is to exploit multiple time-local views related to hand pose and movement to generate rich gesture descriptions, along with using heterogeneous tasks to achieve high accuracy. OO-dMVMT extends the classical MVMT paradigm, where all of the multiple tasks have to be active at each time, by allowing specific tasks to switch on/off depending on whether they can apply to the input. We show that OO-dMVMT defines the new SotA on continuous/online 3D skeleton-based gesture recognition in terms of gesture classification accuracy, segmentation accuracy, false positives, and decision latency while maintaining real-time operation.

Cite

Text

Cunico et al. "OO-dMVMT: A Deep Multi-View Multi-Task Classification Framework for Real-Time 3D Hand Gesture Classification and Segmentation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00275

Markdown

[Cunico et al. "OO-dMVMT: A Deep Multi-View Multi-Task Classification Framework for Real-Time 3D Hand Gesture Classification and Segmentation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/cunico2023cvprw-oodmvmt/) doi:10.1109/CVPRW59228.2023.00275

BibTeX

@inproceedings{cunico2023cvprw-oodmvmt,
  title     = {{OO-dMVMT: A Deep Multi-View Multi-Task Classification Framework for Real-Time 3D Hand Gesture Classification and Segmentation}},
  author    = {Cunico, Federico and Girella, Federico and Avogaro, Andrea and Emporio, Marco and Giachetti, Andrea and Cristani, Marco},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {2745-2754},
  doi       = {10.1109/CVPRW59228.2023.00275},
  url       = {https://mlanthology.org/cvprw/2023/cunico2023cvprw-oodmvmt/}
}