LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

Liu, Mushui; Ma, Yuhang; Yang, Zhen; Dan, Jun; Yu, Yunlong; Zhao, Zeng; Hu, Zhipeng; Liu, Bai; Fan, Changjie

doi:10.1609/AAAI.V39I5.32588

LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation

Mushui Liu, Yuhang Ma, Zhen Yang, Jun Dan, Yunlong Yu, Zeng Zhao, Zhipeng Hu, Bai Liu, Changjie Fan

AAAI 2025 pp. 5523-5531

doi:10.1609/AAAI.V39I5.32588 /aaai/2025/liu2025aaai-llm/

Abstract

Diffusion models have exhibited substantial success in text-to-image generation. However, they often encounter challenges when dealing with complex and dense prompts involving multiple objects, attribute binding, and long descriptions. In this paper, we propose a novel framework called LLM4GEN, which enhances the semantic understanding of text-to-image diffusion models by leveraging the representation of Large Language Models (LLMs). It can be seamlessly incorporated into various diffusion models as a plug-and-play component. A specially designed Cross-Adapter Module (CAM) integrates the original text features of text-to-image models with LLM features, thereby enhancing text-to-image generation. Additionally, to facilitate and correct entity-attribute relationships in text prompts, we develop an entity-guided regularization loss to further improve generation performance. We also introduce DensePrompts, which contains 7,000 dense prompts to provide a comprehensive evaluation for the text-to-image generation task. Experiments indicate that LLM4GEN significantly improves the semantic alignment of SD1.5 and SDXL, demonstrating increases of 9.69% and 12.90% in color on T2I-CompBench, respectively. Moreover, it surpasses existing models in terms of sample quality, image-text alignment, and human evaluation.

PDF AAAI Semantic Scholar

Cite

Text

Liu et al. "LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I5.32588

Markdown

[Liu et al. "LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/liu2025aaai-llm/) doi:10.1609/AAAI.V39I5.32588

BibTeX

@inproceedings{liu2025aaai-llm,
  title     = {{LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation}},
  author    = {Liu, Mushui and Ma, Yuhang and Yang, Zhen and Dan, Jun and Yu, Yunlong and Zhao, Zeng and Hu, Zhipeng and Liu, Bai and Fan, Changjie},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {5523-5531},
  doi       = {10.1609/AAAI.V39I5.32588},
  url       = {https://mlanthology.org/aaai/2025/liu2025aaai-llm/}
}