Long-Term Action Forecasting Using Multi-Headed Attention-Based Variational Recurrent Neural Networks

Abstract

Systems developed for predicting both the action and the amount of time someone might take to perform that action need to be aware of the inherent uncertainty in what humans do. Here, we present a novel hybrid generative model for action anticipation that attempts to capture the uncertainty in human actions. Our model uses a multi-headed attention-based variational generative model for action prediction (MAVAP), and Gaussian log-likelihood maximization to predict the corresponding action’s duration. During training, we optimise three losses: a variational loss, a negative log-likelihood loss, and a discriminative cross-entropy loss. We evaluate our model on benchmark datasets (i.e., Breakfast and 50Salads) for action forecasting tasks and demonstrate improvements over prior methods using both ground truth observations and predicted features from an action segmentation network (i.e., MS-TCN++). We also show that factorizing the latent space across multiple Gaussian heads predicts better plausible future action sequences compared to a single Gaussian.

Cite

Text

Loh et al. "Long-Term Action Forecasting Using Multi-Headed Attention-Based Variational Recurrent Neural Networks." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00270

Markdown

[Loh et al. "Long-Term Action Forecasting Using Multi-Headed Attention-Based Variational Recurrent Neural Networks." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/loh2022cvprw-longterm/) doi:10.1109/CVPRW56347.2022.00270

BibTeX

@inproceedings{loh2022cvprw-longterm,
  title     = {{Long-Term Action Forecasting Using Multi-Headed Attention-Based Variational Recurrent Neural Networks}},
  author    = {Loh, Siyuan Brandon and Roy, Debaditya and Fernando, Basura},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2022},
  pages     = {2418-2426},
  doi       = {10.1109/CVPRW56347.2022.00270},
  url       = {https://mlanthology.org/cvprw/2022/loh2022cvprw-longterm/}
}