Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder

Abstract

Text-to-Image (T2I) Diffusion Models have achieved remarkable performance in generating high quality images. However, enabling precise control of continuous attributes, especially multiple attributes simultaneously, in a new domain (e.g., numeric values like eye openness or car width) with text-only guidance remains a significant challenge. To address this, we introduce the **Attribute (Att) Adapter**, a novel plug-and-play module designed to enable fine-grained, multi-attributes control in pretrained diffusion models. Our approach learns a single control adapter from a set of sample images that can be unpaired and contain multiple visual attributes. The Att-Adapter leverages the decoupled cross attention module to naturally harmonize the multiple domain attributes with text conditioning.We further introduce Conditional Variational Autoencoder (CVAE) to the Att-Adapter to mitigate overfitting, matching the diverse nature of the visual world.Evaluations on two public datasets show that Att-Adapter outperforms all LoRA-based baselines in controlling continuous attributes. Additionally, our method enables a broader control range and also improves disentanglement across multiple attributes, surpassing StyleGAN-based techniques. Notably, Att-Adapter is flexible, requiring no paired synthetic data for training, and is easily scalable to multiple attributes within a single model.

Cite

Text

Cho et al. "Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder." International Conference on Computer Vision, 2025.

Markdown

[Cho et al. "Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/cho2025iccv-attadapter/)

BibTeX

@inproceedings{cho2025iccv-attadapter,
  title     = {{Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder}},
  author    = {Cho, Wonwoong and Chen, Yan-Ying and Klenk, Matthew and Inouye, David I. and Zhang, Yanxia},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {15626-15635},
  url       = {https://mlanthology.org/iccv/2025/cho2025iccv-attadapter/}
}