TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition

Shilin Lu, Yanzhu Liu, Adams Wai-Kin Kong

ICCV 2023 pp. 2294-2305

doi:10.1109/ICCV51070.2023.00218 /iccv/2023/lu2023iccv-tficon/

Abstract

Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform cross-domain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information, to facilitate text-driven diffusion models in accurately inverting real images into latent representations, forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TF-ICON surpasses prior baselines in versatile visual domains. Code is available at https://github.com/Shilin-LU/TF-ICON

PDF ICCV Semantic Scholar

Cite

Text

Lu et al. "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.00218

Markdown

[Lu et al. "TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/lu2023iccv-tficon/) doi:10.1109/ICCV51070.2023.00218

BibTeX

@inproceedings{lu2023iccv-tficon,
  title     = {{TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition}},
  author    = {Lu, Shilin and Liu, Yanzhu and Kong, Adams Wai-Kin},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {2294-2305},
  doi       = {10.1109/ICCV51070.2023.00218},
  url       = {https://mlanthology.org/iccv/2023/lu2023iccv-tficon/}
}