TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection

Min, Kyle; Corso, Jason J.

doi:10.1109/ICCV.2019.00248

TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection

Kyle Min, Jason J. Corso

ICCV 2019

doi:10.1109/ICCV.2019.00248 /iccv/2019/min2019iccv-tasednet/

Abstract

TASED-Net is a 3D fully-convolutional network architecture for video saliency detection. It consists of two building blocks: first, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information. As a result, a single prediction map is produced from an input clip of multiple frames. Frame-wise saliency maps can be predicted by applying TASED-Net in a sliding-window fashion to a video. The proposed approach assumes that the saliency map of any frame can be predicted by considering a limited number of past frames. The results of our extensive experiments on video saliency detection validate this assumption and demonstrate that our fully-convolutional model with temporal aggregation method is effective. TASED-Net significantly outperforms previous state-of-the-art approaches on all three major large-scale datasets of video saliency detection: DHF1K, Hollywood2, and UCFSports. After analyzing the results qualitatively, we observe that our model is especially better at attending to salient moving objects.

PDF ICCV Semantic Scholar

Cite

Text

Min and Corso. "TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00248

Markdown

[Min and Corso. "TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/min2019iccv-tasednet/) doi:10.1109/ICCV.2019.00248

BibTeX

@inproceedings{min2019iccv-tasednet,
  title     = {{TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection}},
  author    = {Min, Kyle and Corso, Jason J.},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00248},
  url       = {https://mlanthology.org/iccv/2019/min2019iccv-tasednet/}
}