Measuring the Importance of Temporal Features in Video Saliency
Abstract
Where people look when watching videos is believed to be heavily influenced by temporal patterns. In this work, we test this assumption by quantifying to which extent gaze on recent video saliency benchmarks can be predicted by a static baseline model. On the recent LEDOV dataset, we find that at least 75% of the explainable information as defined by a gold standard model can be explained using static features. Our baseline model ``DeepGaze MR'' even outperforms state-of-the-art video saliency models, despite deliberately ignoring all temporal patterns. Visual inspection of our static baseline’s failure cases shows that clear temporal effects on human gaze placement exist, but are both rare in the dataset and not captured by any of the recent video saliency models. To focus the development of video saliency models on better capturing temporal effects we construct a meta-dataset consisting of those examples requiring temporal information.
Cite
Text
Tangemann et al. "Measuring the Importance of Temporal Features in Video Saliency." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58604-1_40Markdown
[Tangemann et al. "Measuring the Importance of Temporal Features in Video Saliency." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/tangemann2020eccv-measuring/) doi:10.1007/978-3-030-58604-1_40BibTeX
@inproceedings{tangemann2020eccv-measuring,
title = {{Measuring the Importance of Temporal Features in Video Saliency}},
author = {Tangemann, Matthias and Kümmerer, Matthias and Wallis, Thomas S.A. and Bethge, Matthias},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2020},
doi = {10.1007/978-3-030-58604-1_40},
url = {https://mlanthology.org/eccv/2020/tangemann2020eccv-measuring/}
}