End-to-End Diffusion Latent Optimization Improves Classifier Guidance

Abstract

Classifier guidance---using the gradients of an image classifier to steer the generations of a diffusion model---has the potential to dramatically expand the creative control over image generation and editing. However, currently classifier guidance requires either training new noise-aware models to obtain accurate gradients or using a one-step denoising approximation of the final generation, which leads to misaligned gradients and sub-optimal control. We highlight this approximation's shortcomings and propose a novel guidance method: Direct Optimization of Diffusion Latents (DOODL), which enables plug-and-play guidance by optimizing diffusion latents w.r.t. the gradients of a pre-trained classifier on the true generated pixels, using an invertible diffusion process to achieve memory-efficient backpropagation. Showcasing the potential of more precise guidance, DOODL outperforms one-step classifier guidance on computational and human evaluation metrics across different forms of guidance: using CLIP guidance to improve generations of complex prompts from DrawBench, using fine-grained visual classifiers to expand the vocabulary of Stable Diffusion, enabling image-conditioned generation with a CLIP visual encoder, and improving image aesthetics using an aesthetic scoring network.

Cite

Text

Wallace et al. "End-to-End Diffusion Latent Optimization Improves Classifier Guidance." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00669

Markdown

[Wallace et al. "End-to-End Diffusion Latent Optimization Improves Classifier Guidance." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/wallace2023iccv-endtoend/) doi:10.1109/ICCV51070.2023.00669

BibTeX

@inproceedings{wallace2023iccv-endtoend,
  title     = {{End-to-End Diffusion Latent Optimization Improves Classifier Guidance}},
  author    = {Wallace, Bram and Gokul, Akash and Ermon, Stefano and Naik, Nikhil},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {7280-7290},
  doi       = {10.1109/ICCV51070.2023.00669},
  url       = {https://mlanthology.org/iccv/2023/wallace2023iccv-endtoend/}
}