Story Visualization by Online Text Augmentation with Context Memory

Daechul Ahn, Daneul Kim, Gwangmo Song, Seung Hwan Kim, Honglak Lee, Dongyeop Kang, Jonghyun Choi

ICCV 2023 pp. 3125-3135

doi:10.1109/ICCV51070.2023.00290 /iccv/2023/ahn2023iccv-story/

Abstract

Story visualization (SV) is a challenging text-to-image generation task for the difficulty of not only rendering visual details from the text descriptions but also encoding a longterm context across multiple sentences. While prior efforts mostly focus on generating a semantically relevant image for each sentence, encoding a context spread across the given paragraph to generate contextually convincing images (e.g., with a correct character or with a proper background of the scene) remains a challenge. To this end, we propose a novel memory architecture for the Bi-directional Transformer framework with an online text augmentation that generates multiple pseudo-descriptions as supplementary supervision during training for better generalization to the language variation at inference. In extensive experiments on the two popular SV benchmarks, i.e., the Pororo-SV and Flintstones-SV, the proposed method significantly outperforms the state of the arts in various metrics including FID, character F1, frame accuracy, BLEU-2/3, and R-precision with similar or less computational complexity.

PDF ICCV Semantic Scholar

Cite

Text

Ahn et al. "Story Visualization by Online Text Augmentation with Context Memory." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00290

Markdown

[Ahn et al. "Story Visualization by Online Text Augmentation with Context Memory." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/ahn2023iccv-story/) doi:10.1109/ICCV51070.2023.00290

BibTeX

@inproceedings{ahn2023iccv-story,
  title     = {{Story Visualization by Online Text Augmentation with Context Memory}},
  author    = {Ahn, Daechul and Kim, Daneul and Song, Gwangmo and Kim, Seung Hwan and Lee, Honglak and Kang, Dongyeop and Choi, Jonghyun},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {3125-3135},
  doi       = {10.1109/ICCV51070.2023.00290},
  url       = {https://mlanthology.org/iccv/2023/ahn2023iccv-story/}
}