Controllable 3D Dance Generation Using Diffusion-Based Transformer U-Net
Abstract
Recently, dance generation has attracted increasing interest. In particular, the success of diffusion models in image generation has led to the emergence of dance generation systems based on the diffusion framework. However, these systems lack controllability, which limits their practical applications. In this paper, we propose a controllable dance generation method based on the diffusion model, which can generate 3D dance motions controlled by 2D keypoint sequences. Specifically, we design a transformer-based U-Net model to predict actual motions. Then, we fix the parameters of the U-Net model and train an additional control network, enabling the generated motions to be controlled by 2D keypoints. We conduct extensive experiments and compared our method with existing works on the widely used AIST++ dataset, demonstrating that our approach has certain advantages and controllability. Moreover, we also test our model on in-the-wild videos and find that it is capable of generating dance movements similar to the motions in the videos as well.
Cite
Text
Guo et al. "Controllable 3D Dance Generation Using Diffusion-Based Transformer U-Net." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I3.32339Markdown
[Guo et al. "Controllable 3D Dance Generation Using Diffusion-Based Transformer U-Net." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/guo2025aaai-controllable/) doi:10.1609/AAAI.V39I3.32339BibTeX
@inproceedings{guo2025aaai-controllable,
title = {{Controllable 3D Dance Generation Using Diffusion-Based Transformer U-Net}},
author = {Guo, Puyuan and Hao, Tuo and Fu, Wenxin and Gao, Yingming and Li, Ya},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {3284-3292},
doi = {10.1609/AAAI.V39I3.32339},
url = {https://mlanthology.org/aaai/2025/guo2025aaai-controllable/}
}