Video BagNet: Short Temporal Receptive Fields Increase Robustness in Long-Term Action Recognition

Abstract

Previous work on long-term video action recognition relies on deep 3D-convolutional models that have a large temporal receptive field (RF). We argue that these models are not always the best choice for temporal modeling in videos. A large temporal receptive field allows the model to encode the exact sub-action order of a video, which causes a performance decrease when testing videos have a different sub-action order. In this work, we investigate whether we can improve the model robustness to the sub-action order by shrinking the temporal receptive field of action recognition models. For this, we design Video BagNet, a variant of the 3D ResNet-50 model with the temporal receptive field size limited to 1, 9, 17 or 33 frames. We analyze Video Bag-Net on synthetic and real-world video datasets and experimentally compare models with varying temporal receptive fields. We find that short receptive fields are robust to sub-action order changes, while larger temporal receptive fields are sensitive to the sub-action order.

Cite

Text

Strafforello et al. "Video BagNet: Short Temporal Receptive Fields Increase Robustness in Long-Term Action Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00023

Markdown

[Strafforello et al. "Video BagNet: Short Temporal Receptive Fields Increase Robustness in Long-Term Action Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/strafforello2023iccvw-video/) doi:10.1109/ICCVW60793.2023.00023

BibTeX

@inproceedings{strafforello2023iccvw-video,
  title     = {{Video BagNet: Short Temporal Receptive Fields Increase Robustness in Long-Term Action Recognition}},
  author    = {Strafforello, Ombretta and Liu, Xin and Schutte, Klamer and van Gemert, Jan},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2023},
  pages     = {159-166},
  doi       = {10.1109/ICCVW60793.2023.00023},
  url       = {https://mlanthology.org/iccvw/2023/strafforello2023iccvw-video/}
}