Universal Guidance for Diffusion Models

Abstract

Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals. Code is available at github.com/arpitbansal297/Universal-Guided-Diffusion.

Cite

Text

Bansal et al. "Universal Guidance for Diffusion Models." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00091

Markdown

[Bansal et al. "Universal Guidance for Diffusion Models." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/bansal2023cvprw-universal/) doi:10.1109/CVPRW59228.2023.00091

BibTeX

@inproceedings{bansal2023cvprw-universal,
  title     = {{Universal Guidance for Diffusion Models}},
  author    = {Bansal, Arpit and Chu, Hong-Min and Schwarzschild, Avi and Sengupta, Soumyadip and Goldblum, Micah and Geiping, Jonas and Goldstein, Tom},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {843-852},
  doi       = {10.1109/CVPRW59228.2023.00091},
  url       = {https://mlanthology.org/cvprw/2023/bansal2023cvprw-universal/}
}