QSD: Query-Selection Denoising Score for Image Editing in Latent Diffusion Model

Abstract

With the surge interest in diffusion models for image editing, text-prompt interfaces such as Midjourney, DALL-E, and Stable Diffusion have become widely used. Recently, the Contrastive Denoising Score (CDS) method has emerged as a state-of-the-art among image-to-image translation diffusion models, leveraging both Delta Denoising Score (DDS) and contrastive loss based on Contrastive learning for Unpaired image-to-image Translation (CUT). However, CDS face challenges in preserving content from the original image. Specifically, the object in the source image is not adequately replaced by the user-intended query object, and the surrounding content is not preserved well. We observe two main issues with CDS that uses the CUT method: it randomly selects features not related to the query objects and it fails to capture structural details. To address these issues, we propose the Query-Selection Denoising score using latent diffusion model (QSD). Our approach employs query-selected attention maps to separate the object of interest from the rest of the image using contrastive learning and Latent Diffusion Model (LDM) under self-supervised learning to preserve the object’s structural shapes. Our proposed method demonstrates superior performance compared to other diffusion models in qualitative and quantitative evaluations, effectively editing the object based on text-prompts while preserving other parts of the image.

Cite

Text

Hwang et al. "QSD: Query-Selection Denoising Score for Image Editing in Latent Diffusion Model." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91838-4_14

Markdown

[Hwang et al. "QSD: Query-Selection Denoising Score for Image Editing in Latent Diffusion Model." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/hwang2024eccvw-qsd/) doi:10.1007/978-3-031-91838-4_14

BibTeX

@inproceedings{hwang2024eccvw-qsd,
  title     = {{QSD: Query-Selection Denoising Score for Image Editing in Latent Diffusion Model}},
  author    = {Hwang, Jungmin and Lim, Changwon and Lee, Wonsook},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2024},
  pages     = {229-243},
  doi       = {10.1007/978-3-031-91838-4_14},
  url       = {https://mlanthology.org/eccvw/2024/hwang2024eccvw-qsd/}
}