Controllable Human Image Generation with Personalized Multi-Garments

Abstract

We present BootControl, a novel framework based on text-to-image diffusion models for controllable human image generation with multiple reference garments.Here, the main bottleneck is data acquisition for training: collecting a large-scale dataset of high-quality reference garment images per human subject is quite challenging, i.e., ideally, one needs to manually gather every single garment photograph worn by each human.To address this, we propose a data generation pipeline to construct a large synthetic dataset, consisting of human and multiple-garment pairs, by introducing a model to extract any reference garment images from each human image.To ensure data quality, we also propose a filtering strategy to remove undesirable generated data based on measuring perceptual similarities between the garment presented in human image and extracted garment.Finally, by utilizing the constructed synthetic dataset, we train a diffusion model having two parallel denoising paths that use multiple garment images as conditions to generate human images while preserving their fine-grained details.We further show the wide-applicability of our framework by adapting it to different types of reference-based generation in the fashion domain, including virtual try-on, and controllable human image generation with other conditions, e.g., pose, face, etc.

Cite

Text

Choi et al. "Controllable Human Image Generation with Personalized Multi-Garments." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02676

Markdown

[Choi et al. "Controllable Human Image Generation with Personalized Multi-Garments." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/choi2025cvpr-controllable/) doi:10.1109/CVPR52734.2025.02676

BibTeX

@inproceedings{choi2025cvpr-controllable,
  title     = {{Controllable Human Image Generation with Personalized Multi-Garments}},
  author    = {Choi, Yisol and Kwak, Sangkyung and Yu, Sihyun and Choi, Hyungwon and Shin, Jinwoo},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {28736-28747},
  doi       = {10.1109/CVPR52734.2025.02676},
  url       = {https://mlanthology.org/cvpr/2025/choi2025cvpr-controllable/}
}