3DInAction: Understanding Human Actions in 3D Point Clouds

Abstract

We propose a novel method for 3D point cloud action recognition. Understanding human actions in RGB videos has been widely studied in recent years however its 3D point cloud counterpart remains under-explored despite the clear value that 3D information may bring. This is mostly due to the inherent limitation of the point cloud data modality---lack of structure permutation invariance and varying number of points---which makes it difficult to learn a spatio-temporal representation. To address this limitation we propose the 3DinAction pipeline that first estimates patches moving in time (t-patches) as a key building block alongside a hierarchical architecture that learns an informative spatio-temporal representation. We show that our method achieves improved performance on existing datasets including DFAUST and IKEA ASM. Code is publicly available at https://github.com/sitzikbs/3dincaction

Cite

Text

Ben-Shabat et al. "3DInAction: Understanding Human Actions in 3D Point Clouds." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01888

Markdown

[Ben-Shabat et al. "3DInAction: Understanding Human Actions in 3D Point Clouds." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/benshabat2024cvpr-3dinaction/) doi:10.1109/CVPR52733.2024.01888

BibTeX

@inproceedings{benshabat2024cvpr-3dinaction,
  title     = {{3DInAction: Understanding Human Actions in 3D Point Clouds}},
  author    = {Ben-Shabat, Yizhak and Shrout, Oren and Gould, Stephen},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {19978-19987},
  doi       = {10.1109/CVPR52733.2024.01888},
  url       = {https://mlanthology.org/cvpr/2024/benshabat2024cvpr-3dinaction/}
}