Video Captioning with Transferred Semantic Attributes
Abstract
Automatically generating natural language descriptions of videos plays a fundamental challenge for computer vision community. Most recent progress in this problem has been achieved through employing 2-D and/or 3-D Convolutional Neural Networks (CNNs) to encode video content and Recurrent Neural Networks (RNNs) to decode a sentence. In this paper, we present Long Short-Term Memory with Transferred Semantic Attributes (LSTM-TSA)---a novel deep architecture that incorporates the transferred semantic attributes learnt from images and videos into the CNN plus RNN framework, by training them in an end-to-end manner. The design of LSTM-TSA is highly inspired by the facts that 1) semantic attributes play a significant contribution to captioning, and 2) images and videos carry complementary semantics and thus can reinforce each other for captioning. To boost video captioning, we propose a novel transfer unit to model the mutually correlated attributes learnt from images and videos. Extensive experiments are conducted on three public datasets, i.e., MSVD, M-VAD and MPII-MD. Our proposed LSTM-TSA achieves to-date the best published performance in sentence generation on MSVD: 52.8% and 74.0% in terms of BLEU@4 and CIDEr-D. Superior results are also reported on M-VAD and MPII-MD when compared to state-of-the-art methods.
Cite
Text
Pan et al. "Video Captioning with Transferred Semantic Attributes." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.111Markdown
[Pan et al. "Video Captioning with Transferred Semantic Attributes." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/pan2017cvpr-video/) doi:10.1109/CVPR.2017.111BibTeX
@inproceedings{pan2017cvpr-video,
title = {{Video Captioning with Transferred Semantic Attributes}},
author = {Pan, Yingwei and Yao, Ting and Li, Houqiang and Mei, Tao},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2017},
doi = {10.1109/CVPR.2017.111},
url = {https://mlanthology.org/cvpr/2017/pan2017cvpr-video/}
}