Learning Event Representations by Encoding the Temporal Context

Abstract

This work aims at learning image representations suitable for event segmentation, a largely unexplored problem in the computer vision literature. The proposed approach is a self-supervised neural network that captures patterns of temporal overlap by learning to predict the feature vector of neighbor frames, given the one of the current frame. The model is inspired to recent experimental findings in neuroscience, showing that stimuli associated with similar temporal contexts are grouped together in the representational space. Experiments performed on image sequences captured at regular intervals have shown that a representation able to encode the temporal context provides very promising results on the task of temporal segmentation.

Cite

Text

Dias and Dimiccoli. "Learning Event Representations by Encoding the Temporal Context." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11015-4_44

Markdown

[Dias and Dimiccoli. "Learning Event Representations by Encoding the Temporal Context." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/dias2018eccvw-learning/) doi:10.1007/978-3-030-11015-4_44

BibTeX

@inproceedings{dias2018eccvw-learning,
  title     = {{Learning Event Representations by Encoding the Temporal Context}},
  author    = {Dias, Catarina and Dimiccoli, Mariella},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2018},
  pages     = {587-596},
  doi       = {10.1007/978-3-030-11015-4_44},
  url       = {https://mlanthology.org/eccvw/2018/dias2018eccvw-learning/}
}