UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer

Abstract

Text-to-image models (T2I) such as StableDiffusion have been used to generate high quality images of people. However, due to the random nature of the generation process, the person has a different appearance e.g. pose, face, and clothing, despite using the same text prompt. The appearance inconsistency makes T2I unsuitable for pose transfer. We address this by proposing a multimodal diffusion model that accepts text, pose, and visual prompting. Our model is the first unified method to perform all person image tasks-generation, pose transfer, and mask-less edit. We also pioneer using small dimensional 3D body model parameters directly to demonstrate new capability - simultaneous pose and camera view interpolation while maintaining the person’s appearance.

Cite

Text

Cheong et al. "UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00451

Markdown

[Cheong et al. "UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/cheong2023iccvw-upgpt/) doi:10.1109/ICCVW60793.2023.00451

BibTeX

@inproceedings{cheong2023iccvw-upgpt,
  title     = {{UPGPT: Universal Diffusion Model for Person Image Generation, Editing and Pose Transfer}},
  author    = {Cheong, Soon Yau and Mustafa, Armin and Gilbert, Andrew},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2023},
  pages     = {4175-4184},
  doi       = {10.1109/ICCVW60793.2023.00451},
  url       = {https://mlanthology.org/iccvw/2023/cheong2023iccvw-upgpt/}
}