Temporal Domain Neural Encoder for Video Representation Learning

Abstract

We address the challenge of learning good video representations by explicitly modeling the relationship between visual concepts in time space. We propose a novel Temporal Preserving Recurrent Neural Network (TPRNN) that extracts and encodes visual dynamics with frame-level features as input. The proposed network architecture captures temporal dynamics by keeping track of the ordinal relationship of co-occurring visual concepts, and constructs video representations with their temporal order patterns. The resultant video representations effectively encode temporal information of dynamic patterns, which makes them more discriminative to human actions performed with different sequences of action patterns. We evaluate the proposed model on several real video datasets, and the results show that it successfully outperforms the baseline models. In particular, we observe significant improvement on action classes that can only be distinguished by capturing the temporal orders of action patterns.

Cite

Text

Hu et al. "Temporal Domain Neural Encoder for Video Representation Learning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017. doi:10.1109/CVPRW.2017.272

Markdown

[Hu et al. "Temporal Domain Neural Encoder for Video Representation Learning." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017.](https://mlanthology.org/cvprw/2017/hu2017cvprw-temporal/) doi:10.1109/CVPRW.2017.272

BibTeX

@inproceedings{hu2017cvprw-temporal,
  title     = {{Temporal Domain Neural Encoder for Video Representation Learning}},
  author    = {Hu, Hao and Wang, Zhaowen and Lee, Joon-Young and Lin, Zhe and Qi, Guo-Jun},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2017},
  pages     = {2192-2199},
  doi       = {10.1109/CVPRW.2017.272},
  url       = {https://mlanthology.org/cvprw/2017/hu2017cvprw-temporal/}
}