On the Robustness of Diffusion Inversion in Image Manipulation
Abstract
Text-guided image editing is a rapidly growing field due to the development of large diffusion models. In this work, we present an effective approach to address the key step of real image editing, known as ``inversion", which involves finding the initial noise vector that reconstructs the input image when conditioned on a text prompt. Existing works on conditional inversion is often unstable and inaccurate, leading to distorted image manipulation. To address these challenges, our method starts by analyzing the inconsistent assumptions and accumulative errors that contribute to the ill-posedness of mathematical inverse problems. We then introduce learnable latent variables as bias correction to approximate invertible and bijective inversion. We perform latent trajectory optimization with a prior to fully invert the image by optimizing the bias correction on the unconditional text prompt and initial noise vector. Our method is based on the publicly Stable Diffusion model and is extensively evaluated on a variety of images and prompt editing, demonstrating high accuracy, robustness, and quality compared to state-of-the-art baseline approaches.
Cite
Text
Zhang et al. "On the Robustness of Diffusion Inversion in Image Manipulation." ICLR 2023 Workshops: RTML, 2023.Markdown
[Zhang et al. "On the Robustness of Diffusion Inversion in Image Manipulation." ICLR 2023 Workshops: RTML, 2023.](https://mlanthology.org/iclrw/2023/zhang2023iclrw-robustness/)BibTeX
@inproceedings{zhang2023iclrw-robustness,
title = {{On the Robustness of Diffusion Inversion in Image Manipulation}},
author = {Zhang, Jiaxin and Das, Kamalika and Kumar, Sricharan},
booktitle = {ICLR 2023 Workshops: RTML},
year = {2023},
url = {https://mlanthology.org/iclrw/2023/zhang2023iclrw-robustness/}
}