Leveraging Self-Supervised Training for Unintentional Action Recognition

Abstract

Unintentional actions are rare occurrences that are difficult to define precisely and that are highly dependent on the temporal context of the action. In this work, we explore such actions and seek to identify the points in videos where the actions transition from intentional to unintentional. We propose a multi-stage framework that exploits inherent biases such as motion speed, motion direction, and order to recognize unintentional actions. To enhance representations via self-supervised training for the task of unintentional action recognition we propose temporal transformations, called T emporal T ransformations of I nherent B iases of U nintentional A ctions (T $^2$ 2 IBUA). The multi-stage approach models the temporal information on both the level of individual frames and full clips. These enhanced representations show strong performance for unintentional action recognition tasks. We provide an extensive ablation study of our framework and report results that significantly improve over the state-of-the-art.

Cite

Text

Duka et al. "Leveraging Self-Supervised Training for Unintentional Action Recognition." European Conference on Computer Vision Workshops, 2022. doi:10.1007/978-3-031-25069-9_5

Markdown

[Duka et al. "Leveraging Self-Supervised Training for Unintentional Action Recognition." European Conference on Computer Vision Workshops, 2022.](https://mlanthology.org/eccvw/2022/duka2022eccvw-leveraging/) doi:10.1007/978-3-031-25069-9_5

BibTeX

@inproceedings{duka2022eccvw-leveraging,
  title     = {{Leveraging Self-Supervised Training for Unintentional Action Recognition}},
  author    = {Duka, Enea and Kukleva, Anna and Schiele, Bernt},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2022},
  pages     = {69-85},
  doi       = {10.1007/978-3-031-25069-9_5},
  url       = {https://mlanthology.org/eccvw/2022/duka2022eccvw-leveraging/}
}