Rethinking Learning Approaches for Long-Term Action Anticipation

Abstract

Action anticipation involves predicting future actions having observed the initial portion of a video. Typically, the observed video is processed as a whole to obtain a video-level representation of the ongoing activity in the video, which is then used for future prediction. We introduce ANTICIPATR which performs long-term action anticipation leveraging segment-level representations learned using individual segments from different activities, in addition to a video-level representation. We propose a two-stage learning approach to train a novel transformer-based model that uses these two types of representations to directly predict a set of future action instances over any given anticipation duration. Results on Breakfast, 50Salads, Epic-Kitchens-55, and EGTEA Gaze+ datasets demonstrate the effectiveness of our approach.

Cite

Text

Nawhal et al. "Rethinking Learning Approaches for Long-Term Action Anticipation." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19830-4

Markdown

[Nawhal et al. "Rethinking Learning Approaches for Long-Term Action Anticipation." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/nawhal2022eccv-rethinking/) doi:10.1007/978-3-031-19830-4

BibTeX

@inproceedings{nawhal2022eccv-rethinking,
  title     = {{Rethinking Learning Approaches for Long-Term Action Anticipation}},
  author    = {Nawhal, Megha and Jyothi, Akash Abdu and Mori, Greg},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19830-4},
  url       = {https://mlanthology.org/eccv/2022/nawhal2022eccv-rethinking/}
}