SMooDi: Stylized Motion Diffusion Model

Abstract

We introduce a novel Stylized Motion Diffusion model, dubbed , to generate stylized motion driven by content texts and style motion sequences. Unlike existing methods that either generate motion of various content or transfer style from one sequence to another, can rapidly generate motion across a broad range of content and diverse styles. To this end, we tailor a pre-trained text-to-motion model for stylization. Specifically, we propose style guidance to ensure that the generated motion closely matches the reference style, alongside a lightweight style adaptor that directs the motion towards the desired style while ensuring realism. Experiments across various applications demonstrate that our proposed framework outperforms existing methods in stylized motion generation. Project Page: https://neu-vi.github.io/SMooDi/

Cite

Text

Zhong et al. "SMooDi: Stylized Motion Diffusion Model." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73232-4_23

Markdown

[Zhong et al. "SMooDi: Stylized Motion Diffusion Model." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/zhong2024eccv-smoodi/) doi:10.1007/978-3-031-73232-4_23

BibTeX

@inproceedings{zhong2024eccv-smoodi,
  title     = {{SMooDi: Stylized Motion Diffusion Model}},
  author    = {Zhong, Lei and Xie, Yiming and Jampani, Varun and Sun, Deqing and Jiang, Huaizu},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73232-4_23},
  url       = {https://mlanthology.org/eccv/2024/zhong2024eccv-smoodi/}
}