EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models

Abstract

Recent years have witnessed remarkable progress in image generation task where users can create visually astonishing images with high-quality. However exsiting text-to-image diffusion models are proficient in generating concrete concepts (dogs) but encounter challenges with more abstract ones (emotions). Several efforts have been made to modify image emotions with color and style adjustments facing limitations in effectively conveying emotions with fixed image contents. In this work we introduce Emotional Image Content Generation (EIGC) a new task to generate semantic-clear and emotion-faithful images given emotion categories. Specifically we propose an emotion space and construct a mapping network to align it with powerful Contrastive Language-Image Pre-training (CLIP) space providing a concrete interpretation of abstract emotions. Attribute loss and emotion confidence are further proposed to ensure the semantic diversity and emotion fidelity of the generated images. Our method outperforms the state-the-art text-to-image approaches both quantitatively and qualitatively where we derive three custom metrics i.e.emotion accuracy semantic clarity and semantic diversity. In addition to generation our method can help emotion understanding and inspire emotional art design. Project page: https://vcc.tech/research/2024/EmoGen.

Cite

Text

Yang et al. "EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00608

Markdown

[Yang et al. "EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/yang2024cvpr-emogen/) doi:10.1109/CVPR52733.2024.00608

BibTeX

@inproceedings{yang2024cvpr-emogen,
  title     = {{EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models}},
  author    = {Yang, Jingyuan and Feng, Jiawei and Huang, Hui},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {6358-6368},
  doi       = {10.1109/CVPR52733.2024.00608},
  url       = {https://mlanthology.org/cvpr/2024/yang2024cvpr-emogen/}
}