Spatio-Temporal Action Detection and Localization Using a Hierarchical LSTM

Abstract

Video analysis is gaining importance in the recent past due to its usefulness in a wide variety of applications. The efficiency of a video analytics engine primarily depends on its ability to extract the spatio-temporal features, which has enough discriminative. Inspired by the way the human visual system operates, we propose a hierarchical architecture to capture the spatio-temporal information from a given input video at different time scales. The proposed architecture has a 3D Inception module followed by two layers of modified Convolutional Long Short Term Memory (ConvLSTM) as the fundamental unit. At each level, we consolidate the LSTM cell and hidden states to the next level by using an visual attention-based pooling approach. The proposed network is used for video action detection and localization application that is the foundational element for video analysis. UCF101 and AVA datasets are used to show that the recognition accuracy achieved by the proposed algorithm advances the state-of-the-art in spatio-temporal action detection and localization application.

Cite

Text

Ramaswamy et al. "Spatio-Temporal Action Detection and Localization Using a Hierarchical LSTM." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020. doi:10.1109/CVPRW50498.2020.00390

Markdown

[Ramaswamy et al. "Spatio-Temporal Action Detection and Localization Using a Hierarchical LSTM." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020.](https://mlanthology.org/cvprw/2020/ramaswamy2020cvprw-spatiotemporal/) doi:10.1109/CVPRW50498.2020.00390

BibTeX

@inproceedings{ramaswamy2020cvprw-spatiotemporal,
  title     = {{Spatio-Temporal Action Detection and Localization Using a Hierarchical LSTM}},
  author    = {Ramaswamy, Akshaya and Seemakurthy, Karthik and Gubbi, Jayavardhana and Purushothaman, Balamuralidhar},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2020},
  pages     = {3303-3312},
  doi       = {10.1109/CVPRW50498.2020.00390},
  url       = {https://mlanthology.org/cvprw/2020/ramaswamy2020cvprw-spatiotemporal/}
}