Localizing the Common Action Among a Few Videos

Abstract

This paper strives to localize the temporal extent of an action in a long untrimmed video. Where existing work leverages many examples with their start, their ending, and/or the class of the action during training time, we propose few-shot common action localization. The start and end of an action in a long untrimmed video is determined based on just a hand-full of trimmed video examples containing the same action, without knowing their common class label. To address this task, we introduce a new 3D convolutional network architecture able to align representations from the support videos with the relevant query video segments. The network contains: (i) a mutual enhancement module to simultaneously complement the representation of the few trimmed support videos and the untrimmed query video; (ii) a progressive alignment module that iteratively fuses the support videos into the query branch; and (iii) a pairwise matching module to weigh the importance of different support videos. Evaluation of few-shot common action localization in untrimmed videos containing a single or multiple action instances demonstrates the effectiveness and general applicability of our proposal.

Cite

Text

Yang et al. "Localizing the Common Action Among a Few Videos." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58571-6_30

Markdown

[Yang et al. "Localizing the Common Action Among a Few Videos." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/yang2020eccv-localizing/) doi:10.1007/978-3-030-58571-6_30

BibTeX

@inproceedings{yang2020eccv-localizing,
  title     = {{Localizing the Common Action Among a Few Videos}},
  author    = {Yang, Pengwan and Hu, Vincent Tao and Mettes, Pascal and Snoek, Cees G. M.},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2020},
  doi       = {10.1007/978-3-030-58571-6_30},
  url       = {https://mlanthology.org/eccv/2020/yang2020eccv-localizing/}
}