Improving Image Synthesis with Diffusion-Negative Sampling

Abstract

For image generation with diffusion models (DMs), a negative prompt n can be used to complement the text prompt p, helping define properties not desired in the synthesized image. While this improves prompt adherence and image quality, finding good negative prompts is challenging. We argue that this is due to a semantic gap between humans and DMs, which makes good negative prompts for DMs appear unintuitive to humans. To bridge this gap, we propose a new diffusion-negative prompting () strategy. is based on a new procedure to sample images that are least compliant with p under the distribution of the DM, denoted as diffusion-negative sampling (). Given p, one such image is sampled, which is then translated into natural language by the user or a captioning model, to produce the negative prompt n∗ . The pair (p, n∗ ) is finally used to prompt the DM. is straightforward to implement and requires no training. Experiments and human evaluations show that performs well both quantitatively and qualitatively and can be easily combined with several DM variants.

Cite

Text

Desai and Vasconcelos. "Improving Image Synthesis with Diffusion-Negative Sampling." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73668-1_12

Markdown

[Desai and Vasconcelos. "Improving Image Synthesis with Diffusion-Negative Sampling." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/desai2024eccv-improving/) doi:10.1007/978-3-031-73668-1_12

BibTeX

@inproceedings{desai2024eccv-improving,
  title     = {{Improving Image Synthesis with Diffusion-Negative Sampling}},
  author    = {Desai, Alakh and Vasconcelos, Nuno},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73668-1_12},
  url       = {https://mlanthology.org/eccv/2024/desai2024eccv-improving/}
}