TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection

Abstract

TASED-Net is a 3D fully-convolutional network architecture for video saliency detection. It consists of two building blocks: first, the encoder network extracts low-resolution spatiotemporal features from an input clip of several consecutive frames, and then the following prediction network decodes the encoded features spatially while aggregating all the temporal information. As a result, a single prediction map is produced from an input clip of multiple frames. Frame-wise saliency maps can be predicted by applying TASED-Net in a sliding-window fashion to a video. The proposed approach assumes that the saliency map of any frame can be predicted by considering a limited number of past frames. The results of our extensive experiments on video saliency detection validate this assumption and demonstrate that our fully-convolutional model with temporal aggregation method is effective. TASED-Net significantly outperforms previous state-of-the-art approaches on all three major large-scale datasets of video saliency detection: DHF1K, Hollywood2, and UCFSports. After analyzing the results qualitatively, we observe that our model is especially better at attending to salient moving objects.

Cite

Text

Min and Corso. "TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00248

Markdown

[Min and Corso. "TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/min2019iccv-tasednet/) doi:10.1109/ICCV.2019.00248

BibTeX

@inproceedings{min2019iccv-tasednet,
  title     = {{TASED-Net: Temporally-Aggregating Spatial Encoder-Decoder Network for Video Saliency Detection}},
  author    = {Min, Kyle and Corso, Jason J.},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00248},
  url       = {https://mlanthology.org/iccv/2019/min2019iccv-tasednet/}
}