Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance

Abstract

In subject-driven text-to-image synthesis the synthesis process tends to be heavily influenced by the reference images provided by users often overlooking crucial attributes detailed in the text prompt. In this work we propose Subject-Agnostic Guidance (SAG) a simple yet effective solution to remedy the problem. We show that through constructing a subject-agnostic condition and applying our proposed dual classifier-free guidance one could obtain outputs consistent with both the given subject and input text prompts. We validate the efficacy of our approach through both optimization-based and encoder-based methods. Additionally we demonstrate its applicability in second-order customization methods where an encoder-based model is fine-tuned with DreamBooth. Our approach is conceptually simple and requires only minimal code modifications but leads to substantial quality improvements as evidenced by our evaluations and user studies.

Cite

Text

Chan et al. "Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00643

Markdown

[Chan et al. "Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/chan2024cvpr-improving/) doi:10.1109/CVPR52733.2024.00643

BibTeX

@inproceedings{chan2024cvpr-improving,
  title     = {{Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance}},
  author    = {Chan, Kelvin C.K. and Zhao, Yang and Jia, Xuhui and Yang, Ming-Hsuan and Wang, Huisheng},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {6733-6742},
  doi       = {10.1109/CVPR52733.2024.00643},
  url       = {https://mlanthology.org/cvpr/2024/chan2024cvpr-improving/}
}