Measuring the Importance of Temporal Features in Video Saliency

Abstract

Where people look when watching videos is believed to be heavily influenced by temporal patterns. In this work, we test this assumption by quantifying to which extent gaze on recent video saliency benchmarks can be predicted by a static baseline model. On the recent LEDOV dataset, we find that at least 75% of the explainable information as defined by a gold standard model can be explained using static features. Our baseline model ``DeepGaze MR'' even outperforms state-of-the-art video saliency models, despite deliberately ignoring all temporal patterns. Visual inspection of our static baseline’s failure cases shows that clear temporal effects on human gaze placement exist, but are both rare in the dataset and not captured by any of the recent video saliency models. To focus the development of video saliency models on better capturing temporal effects we construct a meta-dataset consisting of those examples requiring temporal information.

Cite

Text

Tangemann et al. "Measuring the Importance of Temporal Features in Video Saliency." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58604-1_40

Markdown

[Tangemann et al. "Measuring the Importance of Temporal Features in Video Saliency." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/tangemann2020eccv-measuring/) doi:10.1007/978-3-030-58604-1_40

BibTeX

@inproceedings{tangemann2020eccv-measuring,
  title     = {{Measuring the Importance of Temporal Features in Video Saliency}},
  author    = {Tangemann, Matthias and Kümmerer, Matthias and Wallis, Thomas S.A. and Bethge, Matthias},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2020},
  doi       = {10.1007/978-3-030-58604-1_40},
  url       = {https://mlanthology.org/eccv/2020/tangemann2020eccv-measuring/}
}