Modeling Temporal Structure with LSTM for Online Action Detection

Abstract

© 2018 IEEE. Online action detection is a challenging problem: A system needs to decide what action is happening at the current frame, based on previous frames only. Fortunately in real-life, human actions are not independent from one another: There are strong (long-term) dependencies between them. An online action detection method should be able to capture these dependencies, to enable a more accurate early detection. At first sight, an LSTM seems very suitable for this problem. It is able to model both short-term and long-term patterns. It takes its input one frame at the time, updates its internal state and gives as output the current class probabilities. In practice, however, the detection results obtained with LSTMs are still quite low. In this work, we start from the hypothesis that it may be too difficult for an LSTM to learn both the interpretation of the input and the temporal patterns at the same time. We propose a two-stream feedback network, where one stream processes the input and the other models the temporal relations. We show improved detection accuracy on an artificial toy dataset and on the Breakfast Dataset [21] and the TVSeries Dataset [7], reallife datasets with inherent temporal dependencies between the actions.

Cite

Text

De Geest and Tuytelaars. "Modeling Temporal Structure with LSTM for Online Action Detection." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018. doi:10.1109/WACV.2018.00173

Markdown

[De Geest and Tuytelaars. "Modeling Temporal Structure with LSTM for Online Action Detection." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018.](https://mlanthology.org/wacv/2018/geest2018wacv-modeling/) doi:10.1109/WACV.2018.00173

BibTeX

@inproceedings{geest2018wacv-modeling,
  title     = {{Modeling Temporal Structure with LSTM for Online Action Detection}},
  author    = {De Geest, Roeland and Tuytelaars, Tinne},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2018},
  pages     = {1549-1557},
  doi       = {10.1109/WACV.2018.00173},
  url       = {https://mlanthology.org/wacv/2018/geest2018wacv-modeling/}
}