EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation

Abstract

We introduce Efficient Motion Diffusion Model (EMDM) for fast and high-quality human motion generation. Current state-of-the-art generative diffusion models have produced impressive results but struggle to achieve fast generation without sacrificing quality. On the one hand, previous works, like motion latent diffusion, conduct diffusion within a latent space for efficiency, but learning such a latent space can be a non-trivial effort. On the other hand, accelerating generation by naively increasing the sampling step size, e.g., DDIM, often leads to quality degradation as it fails to approximate the complex denoising distribution. To address these issues, we propose EMDM, which captures the complex distribution during multiple sampling steps in the diffusion model, allowing for much fewer sampling steps and significant acceleration in generation. This is achieved by a conditional denoising diffusion GAN to capture multimodal data distributions among arbitrary (and potentially larger) step sizes conditioned on control signals, enabling fewer-step motion sampling with high fidelity and diversity. To minimize undesired motion artifacts, geometric losses are imposed during network learning. As a result, EMDM achieves real-time motion generation and significantly improves the efficiency of motion diffusion models compared to existing methods while achieving high-quality motion generation. Our code is available at https: //github.com/Frank-ZY-Dou/EMDM.

Cite

Text

Zhou et al. "EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72627-9_2

Markdown

[Zhou et al. "EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/zhou2024eccv-emdm/) doi:10.1007/978-3-031-72627-9_2

BibTeX

@inproceedings{zhou2024eccv-emdm,
  title     = {{EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation}},
  author    = {Zhou, Wenyang and Dou, Zhiyang and Cao, Zeyu and Liao, Zhouyingcheng and Wang, Jingbo and Wang, Wenjia and Liu, Yuan and Komura, Taku and Wang, Wenping and Liu, Lingjie},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72627-9_2},
  url       = {https://mlanthology.org/eccv/2024/zhou2024eccv-emdm/}
}