Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
Abstract
Advancements in text-to-image diffusion models have broadened extensive downstream practical applications, but such models often encounter misalignment issues between text and image. Taking the generation of a combination of two disentangled concepts as an example, say given the prompt a tea cup of iced coke, existing models usually generate a glass cup of iced coke because the iced coke usually co-occurs with the glass cup instead of the tea one during model training. The root of such misalignment is attributed to the confusion in the latent semantic space of text-to-image diffusion models, and hence we refer to the a tea cup of iced coke phenomenon as Latent Concept Misalignment (LC-Mis). We leverage large language models (LLMs) to thoroughly investigate the scope of LC-Mis, and develop an automated pipeline for aligning the latent semantics of diffusion models to text prompts. Empirical assessments confirm the effectiveness of our approach, substantially reducing LC-Mis errors and enhancing the robustness and versatility of text-to-image diffusion models. Our code and dataset have been available online for reference.
Cite
Text
Zhao et al. "Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72890-7_19Markdown
[Zhao et al. "Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/zhao2024eccv-lost/) doi:10.1007/978-3-031-72890-7_19BibTeX
@inproceedings{zhao2024eccv-lost,
title = {{Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models}},
author = {Zhao, Juntu and Deng, Junyu and Ye, Yixin and Li, Chongxuan and Deng, Zhijie and Wang, Dequan},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-72890-7_19},
url = {https://mlanthology.org/eccv/2024/zhao2024eccv-lost/}
}