Making Text-to-Image Diffusion Models Zero-Shot Image-to-Image Editors by Inferring "Random Seeds"
Abstract
Recent text-to-image diffusion models trained on large-scale data achieve remarkable performance on text-conditioned image synthesis (e.g., GLIDE, DALL∙E 2, Imagen, Stable Diffusion). This paper introduces a simple method to use stochastic text-to-image diffusion models as zero-shot image editors. Our method, CycleDiffusion, is based on the finding that when all random variables (or ``random seed'') are fixed, two similar text prompts will produce similar images. The core of our idea is to infer the random variables that are likely to generate a source image conditioned on a source text. With the inferred random variables, the text-to-image diffusion model then generates a target image conditioned a target text. Our experiments show that CycleDiffusion outperforms SDEdit and the ODE-based DDIB method, and it can be further improved by Cross Attention Control. Demo: https://huggingface.co/spaces/ChenWu98/Stable-CycleDiffusion.
Cite
Text
Wu and De la Torre. "Making Text-to-Image Diffusion Models Zero-Shot Image-to-Image Editors by Inferring "Random Seeds"." NeurIPS 2022 Workshops: SBM, 2022.Markdown
[Wu and De la Torre. "Making Text-to-Image Diffusion Models Zero-Shot Image-to-Image Editors by Inferring "Random Seeds"." NeurIPS 2022 Workshops: SBM, 2022.](https://mlanthology.org/neuripsw/2022/wu2022neuripsw-making/)BibTeX
@inproceedings{wu2022neuripsw-making,
title = {{Making Text-to-Image Diffusion Models Zero-Shot Image-to-Image Editors by Inferring "Random Seeds"}},
author = {Wu, Chen Henry and De la Torre, Fernando},
booktitle = {NeurIPS 2022 Workshops: SBM},
year = {2022},
url = {https://mlanthology.org/neuripsw/2022/wu2022neuripsw-making/}
}