DreamPose: Fashion Video Synthesis with Stable Diffusion

Abstract

We present DreamPose, a diffusion-based method for generating animated fashion videos from still images. Given an image and a sequence of human body poses, our method synthesizes a video containing both human and fabric motion. To achieve this, we transform a pretrained text-to-image model (Stable Diffusion) into a pose-and-image guided video synthesis model, using a novel finetuning strategy, a set of architectural changes to support the added conditioning signals, and techniques to encourage temporal consistency. We fine-tune on a collection of fashion videos from the UBC Fashion dataset. We evaluate our method on a variety of clothing styles and poses, and demonstrate that our method produces state-of-the-art results on fashion video animation. Video results are available at www.grail.cs.washington.edu/projects/dreampose.

Cite

Text

Karras et al. "DreamPose: Fashion Video Synthesis with Stable Diffusion." International Conference on Computer Vision, 2023.

Markdown

[Karras et al. "DreamPose: Fashion Video Synthesis with Stable Diffusion." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/karras2023iccv-dreampose/)

BibTeX

@inproceedings{karras2023iccv-dreampose,
  title     = {{DreamPose: Fashion Video Synthesis with Stable Diffusion}},
  author    = {Karras, Johanna and Holynski, Aleksander and Wang, Ting-Chun and Kemelmacher-Shlizerman, Ira},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {22680-22690},
  url       = {https://mlanthology.org/iccv/2023/karras2023iccv-dreampose/}
}