Leveraging the Present to Anticipate the Future in Videos
Abstract
Anticipating actions before they are executed is crucial for a wide range of practical applications including autonomous driving and the moderation of live video streaming. While most prior work in this area requires partial observation of executed actions, in the paper we focus on anticipating actions seconds before they start. Our proposed approach is the fusion of a purely anticipatory model with a complementary model constrained to reason about the present. In particular, the latter predicts present action and scene attributes, and reasons about how they evolve over time. By doing so, we aim at modeling action anticipation at a more conceptual level than directly predicting future actions. Our model outperforms previously reported methods on the EPIC-KITCHENS and Breakfast datasets.
Cite
Text
Miech et al. "Leveraging the Present to Anticipate the Future in Videos." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019. doi:10.1109/CVPRW.2019.00351Markdown
[Miech et al. "Leveraging the Present to Anticipate the Future in Videos." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.](https://mlanthology.org/cvprw/2019/miech2019cvprw-leveraging/) doi:10.1109/CVPRW.2019.00351BibTeX
@inproceedings{miech2019cvprw-leveraging,
title = {{Leveraging the Present to Anticipate the Future in Videos}},
author = {Miech, Antoine and Laptev, Ivan and Sivic, Josef and Wang, Heng and Torresani, Lorenzo and Tran, Du},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2019},
pages = {2915-2922},
doi = {10.1109/CVPRW.2019.00351},
url = {https://mlanthology.org/cvprw/2019/miech2019cvprw-leveraging/}
}