Show Me a Story: Towards Coherent Neural Story Illustration
Abstract
We propose an end-to-end network for the visual illustration of a sequence of sentences forming a story. At the core of our model is the ability to model the inter-related nature of the sentences within a story, as well as the ability to learn coherence to support reference resolution. The framework takes the form of an encoder-decoder architecture, where sentences are encoded using a hierarchical two-level sentence-story GRU, combined with an encoding of coherence, and sequentially decoded using predicted feature representation into a consistent illustrative image sequence. We optimize all parameters of our network in an end-to-end fashion with respect to order embedding loss, encoding entailment between images and sentences. Experiments on the VIST storytelling dataset cite{vist} highlight the importance of our algorithmic choices and efficacy of our overall model.
Cite
Text
Ravi et al. "Show Me a Story: Towards Coherent Neural Story Illustration." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. doi:10.1109/CVPR.2018.00794Markdown
[Ravi et al. "Show Me a Story: Towards Coherent Neural Story Illustration." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.](https://mlanthology.org/cvpr/2018/ravi2018cvpr-show/) doi:10.1109/CVPR.2018.00794BibTeX
@inproceedings{ravi2018cvpr-show,
title = {{Show Me a Story: Towards Coherent Neural Story Illustration}},
author = {Ravi, Hareesh and Wang, Lezi and Muniz, Carlos and Sigal, Leonid and Metaxas, Dimitris and Kapadia, Mubbasir},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2018},
doi = {10.1109/CVPR.2018.00794},
url = {https://mlanthology.org/cvpr/2018/ravi2018cvpr-show/}
}