Skeleton-Based Action Recognition of People Handling Objects

Abstract

In visual surveillance systems, it is necessary to recognize the behavior of people handling objects such as a phone, a cup, or a plastic bag. In this paper, to address this problem, we propose a new framework for recognizing object-related human actions by graph convolutional networks using human and object poses. In this framework, we construct skeletal graphs of reliable human poses by selectively sampling the informative frames in a video, which include human joints with high confidence scores obtained in pose estimation. The skeletal graphs generated from the sampled frames represent human poses related to the object position in both the spatial and temporal domains, and these graphs are used as inputs to the graph convolutional networks. Through experiments over an open benchmark and our own data sets, we verify the validity of our framework in that our method outperforms the state-of-the-art method for skeleton-based action recognition.

Cite

Text

Kim et al. "Skeleton-Based Action Recognition of People Handling Objects." IEEE/CVF Winter Conference on Applications of Computer Vision, 2019. doi:10.1109/WACV.2019.00014

Markdown

[Kim et al. "Skeleton-Based Action Recognition of People Handling Objects." IEEE/CVF Winter Conference on Applications of Computer Vision, 2019.](https://mlanthology.org/wacv/2019/kim2019wacv-skeleton/) doi:10.1109/WACV.2019.00014

BibTeX

@inproceedings{kim2019wacv-skeleton,
  title     = {{Skeleton-Based Action Recognition of People Handling Objects}},
  author    = {Kim, Sunoh and Yun, Kimin and Park, Jongyoul and Choi, Jin Young},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2019},
  pages     = {61-70},
  doi       = {10.1109/WACV.2019.00014},
  url       = {https://mlanthology.org/wacv/2019/kim2019wacv-skeleton/}
}