Image Is All You Need to Empower Large-Scale Diffusion Models for In-Domain Generation

Abstract

In-domain generation aims to perform a variety of tasks within a specific domain, such as unconditional generation, text-to-image, image editing, 3D generation, and more. Early research typically required training specialized generators for each unique task and domain, often relying on fully-labeled data. Motivated by the powerful generative capabilities and broad applications of diffusion models, we are driven to explore leveraging label-free data to empower these models for in-domain generation.Fine-tuning a pre-trained generative model on domain data is an intuitive but challenging way and often requires complex manual hyper-parameter adjustments since the limited diversity of the training data can easily disrupt the model's original generative capabilities.To address this challenge, we propose a guidance-decoupled prior preservation mechanism to achieve high generative quality and controllability by image-only data, inspired by preserving the pre-trained model from a denoising guidance perspective.We decouple domain-related guidance from the conditional guidance used in classifier-free guidance mechanisms to preserve open-world control guidance and unconditional guidance from the pre-trained model. We further propose an efficient domain knowledge learning technique to train an additional text-free UNet copy to predict domain guidance.Besides, we theoretically illustrate a multi-guidance in-domain generation pipeline for a variety of generative tasks, leveraging multiple guidances from distinct diffusion models and conditions. Extensive experiments demonstrate the superiority of our method in domain-specific synthesis and its compatibility with various diffusion-based control methods and applications.

Cite

Text

Cao et al. "Image Is All You Need to Empower Large-Scale Diffusion Models for In-Domain Generation." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01711

Markdown

[Cao et al. "Image Is All You Need to Empower Large-Scale Diffusion Models for In-Domain Generation." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/cao2025cvpr-image/) doi:10.1109/CVPR52734.2025.01711

BibTeX

@inproceedings{cao2025cvpr-image,
  title     = {{Image Is All You Need to Empower Large-Scale Diffusion Models for In-Domain Generation}},
  author    = {Cao, Pu and Zhou, Feng and Yang, Lu and Huang, Tianrui and Song, Qing},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {18358-18368},
  doi       = {10.1109/CVPR52734.2025.01711},
  url       = {https://mlanthology.org/cvpr/2025/cao2025cvpr-image/}
}