The Hidden Language of Diffusion Models

Abstract

Text-to-image diffusion models have demonstrated an unparalleled ability to generate high-quality, diverse images from a textual prompt. However, the internal representations learned by these models remain an enigma. In this work, we present Conceptor, a novel method to interpret the internal representation of a textual concept by a diffusion model. This interpretation is obtained by decomposing the concept into a small set of human-interpretable textual elements. Applied over the state-of-the-art Stable Diffusion model, Conceptor reveals non-trivial structures in the representations of concepts. For example, we find surprising visual connections between concepts, that transcend their textual semantics. We additionally discover concepts that rely on mixtures of exemplars, biases, renowned artistic styles, or a simultaneous fusion of multiple meanings of the concept. Through a large battery of experiments, we demonstrate Conceptor's ability to provide meaningful, robust, and faithful decompositions for a wide variety of abstract, concrete, and complex textual concepts, while allowing to naturally connect each decomposition element to its corresponding visual impact on the generated images.

Cite

Text

Chefer et al. "The Hidden Language of Diffusion Models." International Conference on Learning Representations, 2024.

Markdown

[Chefer et al. "The Hidden Language of Diffusion Models." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/chefer2024iclr-hidden/)

BibTeX

@inproceedings{chefer2024iclr-hidden,
  title     = {{The Hidden Language of Diffusion Models}},
  author    = {Chefer, Hila and Lang, Oran and Geva, Mor and Polosukhin, Volodymyr and Shocher, Assaf and Irani, Michal and Mosseri, Inbar and Wolf, Lior},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/chefer2024iclr-hidden/}
}