Universal Guidance for Diffusion Models
Abstract
Typical diffusion models are trained to accept a particular form of conditioning, most commonly text, and cannot be conditioned on other modalities without retraining. In this work, we propose a universal guidance algorithm that enables diffusion models to be controlled by arbitrary guidance modalities without the need to retrain any use-specific components. We show that our algorithm successfully generates quality images with guidance functions including segmentation, face recognition, object detection, and classifier signals. Code is available at github.com/arpitbansal297/Universal-Guided-Diffusion.
Cite
Text
Bansal et al. "Universal Guidance for Diffusion Models." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00091Markdown
[Bansal et al. "Universal Guidance for Diffusion Models." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/bansal2023cvprw-universal/) doi:10.1109/CVPRW59228.2023.00091BibTeX
@inproceedings{bansal2023cvprw-universal,
title = {{Universal Guidance for Diffusion Models}},
author = {Bansal, Arpit and Chu, Hong-Min and Schwarzschild, Avi and Sengupta, Soumyadip and Goldblum, Micah and Geiping, Jonas and Goldstein, Tom},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2023},
pages = {843-852},
doi = {10.1109/CVPRW59228.2023.00091},
url = {https://mlanthology.org/cvprw/2023/bansal2023cvprw-universal/}
}