Modeling Sub-Event Dynamics in First-Person Action Recognition

Abstract

First-person videos have unique characteristics such as heavy egocentric motion, strong preceding events, salient transitional activities and post-event impacts. Action recognition methods designed for third person videos may not optimally represent actions captured by first-person videos. We propose a method to represent the high level dynamics of sub-events in first-person videos by dynamically pooling features of sub-intervals of time series using a temporal feature pooling function. The sub-event dynamics are then temporally aligned to make a new series. To keep track of how the sub-event dynamics evolve over time, we recursively employ the Fast Fourier Transform on a pyramidal temporal structure. The Fourier coefficients of the segment define the overall video representation. We perform experiments on two existing benchmark first-person video datasets which have been captured in a controlled environment. Addressing this gap, we introduce a new dataset collected from YouTube which has a larger number of classes and a greater diversity of capture conditions thereby more closely depicting real-world challenges in first-person video analysis. We compare our method to state-of-the-art first person and generic video recognition algorithms. Our method consistently outperforms the nearest competitors by 10.3%, 3.3% and 11.7% respectively on the three datasets.

Cite

Text

Zaki et al. "Modeling Sub-Event Dynamics in First-Person Action Recognition." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.176

Markdown

[Zaki et al. "Modeling Sub-Event Dynamics in First-Person Action Recognition." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/zaki2017cvpr-modeling/) doi:10.1109/CVPR.2017.176

BibTeX

@inproceedings{zaki2017cvpr-modeling,
  title     = {{Modeling Sub-Event Dynamics in First-Person Action Recognition}},
  author    = {Zaki, Hasan F. M. and Shafait, Faisal and Mian, Ajmal},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2017},
  doi       = {10.1109/CVPR.2017.176},
  url       = {https://mlanthology.org/cvpr/2017/zaki2017cvpr-modeling/}
}