NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models

Abstract

Recent large-scale text-guided diffusion models provide powerful image generation capabilities. Currently, a massive effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing tools. To edit a real image using these state-of-the-art tools, one must first invert the image with a meaningful text prompt into the pretrained model's domain. In this paper, we introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image. Our proposed inversion consists of two key novel components: (i) Pivotal inversion for diffusion models. While current methods aim at mapping random noise samples to a single input image, we use a single pivotal noise vector for each timestamp and optimize around it. We recognize that a direct DDIM inversion is inadequate on its own, but does provide a rather good anchor for our optimization. (ii) NULL-text optimization, where we only modify the unconditional textual embedding that is used for classifier-free guidance, rather than the input text embedding. This allows for keeping both the model weights and the conditional embedding intact and hence enables applying prompt-based editing while avoiding the cumbersome tuning of the model's weights. Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and various prompt editing, showing high-fidelity editing of real images.

Cite

Text

Mokady et al. "NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00585

Markdown

[Mokady et al. "NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/mokady2023cvpr-nulltext/) doi:10.1109/CVPR52729.2023.00585

BibTeX

@inproceedings{mokady2023cvpr-nulltext,
  title     = {{NULL-Text Inversion for Editing Real Images Using Guided Diffusion Models}},
  author    = {Mokady, Ron and Hertz, Amir and Aberman, Kfir and Pritch, Yael and Cohen-Or, Daniel},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {6038-6047},
  doi       = {10.1109/CVPR52729.2023.00585},
  url       = {https://mlanthology.org/cvpr/2023/mokady2023cvpr-nulltext/}
}