Temporal Reasoning in Videos Using Convolutional Gated Recurrent Units

Dwibedi, Debidatta; Sermanet, Pierre; Tompson, Jonathan

Temporal Reasoning in Videos Using Convolutional Gated Recurrent Units

Debidatta Dwibedi, Pierre Sermanet, Jonathan Tompson

CVPRW 2018 pp. 1111-1116

/cvprw/2018/dwibedi2018cvprw-temporal/

Abstract

Recently, deep learning based models have pushed state-of-the-art performance for the task of action recognition in videos. Yet, for many action recognition datasets like Kinetics and UCF101, the correct temporal order of frames doesn't seem to be essential to solving the task. We find that the temporal order matters more for the recently introduced 20BN Something-Something dataset where the task of fine-grained action recognition necessitates the model to do temporal reasoning. We show that when temporal order matters, recurrent models can provide a significant boost in performance. Using qualitative methods, we show that when the task of action recognition requires temporal reasoning, the hidden states of the recurrent units encode meaningful state transitions.

PDF CVPRW Semantic Scholar

Cite

Text

Dwibedi et al. "Temporal Reasoning in Videos Using Convolutional Gated Recurrent Units." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018.

Markdown

[Dwibedi et al. "Temporal Reasoning in Videos Using Convolutional Gated Recurrent Units." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018.](https://mlanthology.org/cvprw/2018/dwibedi2018cvprw-temporal/)

BibTeX

@inproceedings{dwibedi2018cvprw-temporal,
  title     = {{Temporal Reasoning in Videos Using Convolutional Gated Recurrent Units}},
  author    = {Dwibedi, Debidatta and Sermanet, Pierre and Tompson, Jonathan},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2018},
  pages     = {1111-1116},
  url       = {https://mlanthology.org/cvprw/2018/dwibedi2018cvprw-temporal/}
}