Salient Object-Aware Background Generation Using Text-Guided Diffusion Models

Abstract

Generating background scenes for salient objects plays a crucial role across various domains including creative design and e-commerce, as it enhances the presentation and context of subjects by integrating them into tailored environments. Background generation can be framed as a task of text-conditioned outpainting, where the goal is to extend image content beyond a salient object’s boundaries on a blank background. Although popular diffusion models for text-guided inpainting can also be used for outpainting by mask inversion, they are trained to fill in missing parts of an image rather than to place an object into a scene. Consequently, when used for background creation, inpainting models frequently extend the salient object’s boundaries and thereby change the object’s identity, which is a phenomenon we call "object expansion." This paper introduces a model for adapting inpainting diffusion models to the salient object outpainting task using Stable Diffusion and ControlNet architectures. We present a series of qualitative and quantitative results across models and datasets, including a newly proposed metric to measure object expansion that does not require any human labeling. Compared to Stable Diffusion 2.0 Inpainting, our proposed approach reduces object expansion by 3.6× on average with no degradation in standard visual metrics across multiple datasets.

Cite

Text

Eshratifar et al. "Salient Object-Aware Background Generation Using Text-Guided Diffusion Models." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00744

Markdown

[Eshratifar et al. "Salient Object-Aware Background Generation Using Text-Guided Diffusion Models." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/eshratifar2024cvprw-salient/) doi:10.1109/CVPRW63382.2024.00744

BibTeX

@inproceedings{eshratifar2024cvprw-salient,
  title     = {{Salient Object-Aware Background Generation Using Text-Guided Diffusion Models}},
  author    = {Eshratifar, Amir Erfan and Soares, João V. B. and Thadani, Kapil and Mishra, Shaunak and Kuznetsov, Mikhail and Ku, Yueh-Ning and de Juan, Paloma},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {7489-7499},
  doi       = {10.1109/CVPRW63382.2024.00744},
  url       = {https://mlanthology.org/cvprw/2024/eshratifar2024cvprw-salient/}
}