Learning Latent Super-Events to Detect Multiple Activities in Videos

Abstract

In this paper, we introduce the concept of learning latent super-events from activity videos, and present how it benefits activity detection in continuous videos. We define a super-event as a set of multiple events occurring together in videos with a particular temporal organization; it is the opposite concept of sub-events. Real-world videos contain multiple activities and are rarely segmented (e.g., surveillance videos), and learning latent super-events allows the model to capture how the events are temporally related in videos. We design emph{temporal structure filters} that enable the model to focus on particular sub-intervals of the videos, and use them together with a soft attention mechanism to learn representations of latent super-events. Super-event representations are combined with per-frame or per-segment CNNs to provide frame-level annotations. Our approach is designed to be fully differentiable, enabling end-to-end learning of latent super-event representations jointly with the activity detector using them. Our experiments with multiple public video datasets confirm that the proposed concept of latent super-event learning significantly benefits activity detection, advancing the state-of-the-arts.

Cite

Text

Piergiovanni and Ryoo. "Learning Latent Super-Events to Detect Multiple Activities in Videos." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. doi:10.1109/CVPR.2018.00556

Markdown

[Piergiovanni and Ryoo. "Learning Latent Super-Events to Detect Multiple Activities in Videos." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.](https://mlanthology.org/cvpr/2018/piergiovanni2018cvpr-learning/) doi:10.1109/CVPR.2018.00556

BibTeX

@inproceedings{piergiovanni2018cvpr-learning,
  title     = {{Learning Latent Super-Events to Detect Multiple Activities in Videos}},
  author    = {Piergiovanni, Aj and Ryoo, Michael S.},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2018},
  doi       = {10.1109/CVPR.2018.00556},
  url       = {https://mlanthology.org/cvpr/2018/piergiovanni2018cvpr-learning/}
}