Temporal Poselets for Collective Activity Detection and Recognition

Abstract

Detection and recognition of collective human activities are important modules of any system devoted to high level social behavior analysis. In this paper, we present a novel semantic-based spatio-temporal descriptor which can cope with several interacting people at different scales and multiple activities in a video. Our descriptor is suitable for modelling the human motion interaction in crowded environments - the scenario most difficult to analyse because of occlusions. In particular, we extend the Pose let detector approach by defining a descriptor based on Pose let activation patterns over time, named TPOS. We will show that this descriptor can effectively tackle complex real scenarios allowing to detect humans in the scene, to localize (in space-time) human activities, and perform collective group activity recognition in a joint manner, reaching state of the art results.

Cite

Text

Nabi et al. "Temporal Poselets for Collective Activity Detection and Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2013. doi:10.1109/ICCVW.2013.71

Markdown

[Nabi et al. "Temporal Poselets for Collective Activity Detection and Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2013.](https://mlanthology.org/iccvw/2013/nabi2013iccvw-temporal/) doi:10.1109/ICCVW.2013.71

BibTeX

@inproceedings{nabi2013iccvw-temporal,
  title     = {{Temporal Poselets for Collective Activity Detection and Recognition}},
  author    = {Nabi, Moin and Del Bue, Alessio and Murino, Vittorio},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2013},
  pages     = {500-507},
  doi       = {10.1109/ICCVW.2013.71},
  url       = {https://mlanthology.org/iccvw/2013/nabi2013iccvw-temporal/}
}