Rethinking Learning Approaches for Long-Term Action Anticipation
Abstract
Action anticipation involves predicting future actions having observed the initial portion of a video. Typically, the observed video is processed as a whole to obtain a video-level representation of the ongoing activity in the video, which is then used for future prediction. We introduce ANTICIPATR which performs long-term action anticipation leveraging segment-level representations learned using individual segments from different activities, in addition to a video-level representation. We propose a two-stage learning approach to train a novel transformer-based model that uses these two types of representations to directly predict a set of future action instances over any given anticipation duration. Results on Breakfast, 50Salads, Epic-Kitchens-55, and EGTEA Gaze+ datasets demonstrate the effectiveness of our approach.
Cite
Text
Nawhal et al. "Rethinking Learning Approaches for Long-Term Action Anticipation." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19830-4Markdown
[Nawhal et al. "Rethinking Learning Approaches for Long-Term Action Anticipation." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/nawhal2022eccv-rethinking/) doi:10.1007/978-3-031-19830-4BibTeX
@inproceedings{nawhal2022eccv-rethinking,
title = {{Rethinking Learning Approaches for Long-Term Action Anticipation}},
author = {Nawhal, Megha and Jyothi, Akash Abdu and Mori, Greg},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2022},
doi = {10.1007/978-3-031-19830-4},
url = {https://mlanthology.org/eccv/2022/nawhal2022eccv-rethinking/}
}