Universal Guidance for Diffusion Models

Abstract

Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, style guidance and classifier signals.

Cite

Text

Bansal et al. "Universal Guidance for Diffusion Models." International Conference on Learning Representations, 2024.

Markdown

[Bansal et al. "Universal Guidance for Diffusion Models." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/bansal2024iclr-universal/)

BibTeX

@inproceedings{bansal2024iclr-universal,
  title     = {{Universal Guidance for Diffusion Models}},
  author    = {Bansal, Arpit and Chu, Hong-Min and Schwarzschild, Avi and Sengupta, Roni and Goldblum, Micah and Geiping, Jonas and Goldstein, Tom},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/bansal2024iclr-universal/}
}