Spatio-Temporal Human-Object Interactions for Action Recognition in Videos

Abstract

We introduce a new method for representing the dynamics of human-object interactions in videos. Previous algorithms tend to focus on modeling the spatial relationships between objects and actors, but ignore the evolving nature of this relationship through time. Our algorithm captures the dynamic nature of human-object interactions by modeling how these patterns evolve with respect to time. Our experiments show that encoding such temporal evolution is crucial for correctly discriminating human actions that involve similar objects and spatial human-object relationships, but only differ on the temporal aspect of the interaction, e.g. answer phone and dial phone We validate our approach on two human activity datasets and show performance improvements over competing state-of-the-art representations.

Cite

Text

Escorcia and Niebles. "Spatio-Temporal Human-Object Interactions for Action Recognition in Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2013. doi:10.1109/ICCVW.2013.72

Markdown

[Escorcia and Niebles. "Spatio-Temporal Human-Object Interactions for Action Recognition in Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2013.](https://mlanthology.org/iccvw/2013/escorcia2013iccvw-spatiotemporal/) doi:10.1109/ICCVW.2013.72

BibTeX

@inproceedings{escorcia2013iccvw-spatiotemporal,
  title     = {{Spatio-Temporal Human-Object Interactions for Action Recognition in Videos}},
  author    = {Escorcia, Victor and Niebles, Juan Carlos},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2013},
  pages     = {508-514},
  doi       = {10.1109/ICCVW.2013.72},
  url       = {https://mlanthology.org/iccvw/2013/escorcia2013iccvw-spatiotemporal/}
}