When Will You Do What? - Anticipating Temporal Occurrences of Activities
Abstract
Analyzing human actions in videos has gained increased attention recently. While most works focus on classifying and labeling observed video frames or anticipating the very recent future, making long-term predictions over more than just a few seconds is a task with many practical applications that has not yet been addressed. In this paper, we propose two methods to predict a considerably large amount of future actions and their durations. Both, a CNN and an RNN are trained to learn future video labels based on previously seen content. We show that our methods generate accurate predictions of the future even for long videos with a huge amount of different actions and can even deal with noisy or erroneous input information.
Cite
Text
Farha et al. "When Will You Do What? - Anticipating Temporal Occurrences of Activities." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. doi:10.1109/CVPR.2018.00560Markdown
[Farha et al. "When Will You Do What? - Anticipating Temporal Occurrences of Activities." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.](https://mlanthology.org/cvpr/2018/farha2018cvpr-you/) doi:10.1109/CVPR.2018.00560BibTeX
@inproceedings{farha2018cvpr-you,
title = {{When Will You Do What? - Anticipating Temporal Occurrences of Activities}},
author = {Farha, Yazan Abu and Richard, Alexander and Gall, Juergen},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2018},
doi = {10.1109/CVPR.2018.00560},
url = {https://mlanthology.org/cvpr/2018/farha2018cvpr-you/}
}