ComFusion: Enhancing Personalized Generation by Instance-Scene Compositing and Fusion

Abstract

Recent progress in personalizing text-to-image (T2I) diffusion models has demonstrated their capability to generate images based on personalized visual concepts using only a few user-provided examples. However, these models often struggle with maintaining high visual fidelity, particularly when modifying scenes according to textual descriptions. To address this challenge, we introduce ComFusion, an innovative approach that leverages pretrained models to create compositions of user-supplied subject images and predefined text scenes. ComFusion incorporates a class-scene prior preservation regularization, which combines subject class and scene-specific knowledge from pretrained models to enhance generation fidelity. Additionally, ComFusion uses coarse-generated images to ensure alignment with both the instance images and scene texts, thereby achieving a delicate balance between capturing the subject’s essence and maintaining scene fidelity. Extensive evaluations of ComFusion against various baselines in T2I personalization have demonstrated its qualitative and quantitative superiority.

Cite

Text

Hong et al. "ComFusion: Enhancing Personalized Generation by Instance-Scene Compositing and Fusion." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72784-9_1

Markdown

[Hong et al. "ComFusion: Enhancing Personalized Generation by Instance-Scene Compositing and Fusion." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/hong2024eccv-comfusion/) doi:10.1007/978-3-031-72784-9_1

BibTeX

@inproceedings{hong2024eccv-comfusion,
  title     = {{ComFusion: Enhancing Personalized Generation by Instance-Scene Compositing and Fusion}},
  author    = {Hong, Yan and Duan, Yuxuan and Zhang, Bo and Chen, Haoxing and Lan, Jun and Zhu, Huijia and Wang, Weiqiang and Zhang, Jianfu},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72784-9_1},
  url       = {https://mlanthology.org/eccv/2024/hong2024eccv-comfusion/}
}