Temporal Reasoning in Videos Using Convolutional Gated Recurrent Units
Abstract
Recently, deep learning based models have pushed state-of-the-art performance for the task of action recognition in videos. Yet, for many action recognition datasets like Kinetics and UCF101, the correct temporal order of frames doesn't seem to be essential to solving the task. We find that the temporal order matters more for the recently introduced 20BN Something-Something dataset where the task of fine-grained action recognition necessitates the model to do temporal reasoning. We show that when temporal order matters, recurrent models can provide a significant boost in performance. Using qualitative methods, we show that when the task of action recognition requires temporal reasoning, the hidden states of the recurrent units encode meaningful state transitions.
Cite
Text
Dwibedi et al. "Temporal Reasoning in Videos Using Convolutional Gated Recurrent Units." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018.Markdown
[Dwibedi et al. "Temporal Reasoning in Videos Using Convolutional Gated Recurrent Units." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018.](https://mlanthology.org/cvprw/2018/dwibedi2018cvprw-temporal/)BibTeX
@inproceedings{dwibedi2018cvprw-temporal,
title = {{Temporal Reasoning in Videos Using Convolutional Gated Recurrent Units}},
author = {Dwibedi, Debidatta and Sermanet, Pierre and Tompson, Jonathan},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2018},
pages = {1111-1116},
url = {https://mlanthology.org/cvprw/2018/dwibedi2018cvprw-temporal/}
}