Customized Generation Reimagined: Fidelity and Editability Harmonized

Abstract

Customized generation aims to incorporate a novel concept into a pre-trained text-to-image model, enabling new generations of the concept in novel contexts guided by textual prompts. However, customized generation suffers from an inherent trade-off between concept fidelity and editability, i.e., between precisely modeling the concept and faithfully adhering to the prompts. Previous methods reluctantly seek a compromise and struggle to achieve both high concept fidelity and ideal prompt alignment simultaneously. In this paper, we propose a “Divide, Conquer, then Integrate” (DCI) framework, which performs a surgical adjustment in the early stage of denoising to liberate the fine-tuned model from the fidelity-editability trade-off at inference. The two conflicting components in the trade-off are decoupled and individually conquered by two collaborative branches, which are then selectively integrated to preserve high concept fidelity while achieving faithful prompt adherence. To obtain a better fine-tuned model, we introduce an Image-specific Context Optimization (ICO) strategy for model customization. ICO replaces manual prompt templates with learnable image-specific contexts, providing an adaptive and precise fine-tuning direction to promote the overall performance. Extensive experiments demonstrate the effectiveness of our method in reconciling the fidelity-editability trade-off. Code is available at https://github.com/jinjianRick/DCI ICO.

Cite

Text

Jin et al. "Customized Generation Reimagined: Fidelity and Editability Harmonized." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72973-7_24

Markdown

[Jin et al. "Customized Generation Reimagined: Fidelity and Editability Harmonized." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/jin2024eccv-customized/) doi:10.1007/978-3-031-72973-7_24

BibTeX

@inproceedings{jin2024eccv-customized,
  title     = {{Customized Generation Reimagined: Fidelity and Editability Harmonized}},
  author    = {Jin, Jian and Shen, Yang and Fu, Zhenyong and Yang, Jian},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72973-7_24},
  url       = {https://mlanthology.org/eccv/2024/jin2024eccv-customized/}
}