Making Text-to-Image Diffusion Models Zero-Shot Image-to-Image Editors by Inferring "Random Seeds"

Abstract

Recent text-to-image diffusion models trained on large-scale data achieve remarkable performance on text-conditioned image synthesis (e.g., GLIDE, DALL∙E 2, Imagen, Stable Diffusion). This paper introduces a simple method to use stochastic text-to-image diffusion models as zero-shot image editors. Our method, CycleDiffusion, is based on the finding that when all random variables (or ``random seed'') are fixed, two similar text prompts will produce similar images. The core of our idea is to infer the random variables that are likely to generate a source image conditioned on a source text. With the inferred random variables, the text-to-image diffusion model then generates a target image conditioned a target text. Our experiments show that CycleDiffusion outperforms SDEdit and the ODE-based DDIB method, and it can be further improved by Cross Attention Control. Demo: https://huggingface.co/spaces/ChenWu98/Stable-CycleDiffusion.

Cite

Text

Wu and De la Torre. "Making Text-to-Image Diffusion Models Zero-Shot Image-to-Image Editors by Inferring "Random Seeds"." NeurIPS 2022 Workshops: SBM, 2022.

Markdown

[Wu and De la Torre. "Making Text-to-Image Diffusion Models Zero-Shot Image-to-Image Editors by Inferring "Random Seeds"." NeurIPS 2022 Workshops: SBM, 2022.](https://mlanthology.org/neuripsw/2022/wu2022neuripsw-making/)

BibTeX

@inproceedings{wu2022neuripsw-making,
  title     = {{Making Text-to-Image Diffusion Models Zero-Shot Image-to-Image Editors by Inferring "Random Seeds"}},
  author    = {Wu, Chen Henry and De la Torre, Fernando},
  booktitle = {NeurIPS 2022 Workshops: SBM},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/wu2022neuripsw-making/}
}