EDGE: Editable Dance Generation from Music
Abstract
Dance is an important human art form, but creating new dances can be difficult and time-consuming. In this work, we introduce Editable Dance GEneration (EDGE), a state-of-the-art method for editable dance generation that is capable of creating realistic, physically-plausible dances while remaining faithful to the input music. EDGE uses a transformer-based diffusion model paired with Jukebox, a strong music feature extractor, and confers powerful editing capabilities well-suited to dance, including joint-wise conditioning, and in-betweening. We introduce a new metric for physical plausibility, and evaluate dance quality generated by our method extensively through (1) multiple quantitative metrics on physical plausibility, alignment, and diversity benchmarks, and more importantly, (2) a large-scale user study, demonstrating a significant improvement over previous state-of-the-art methods. Qualitative samples from our model can be found at our website.
Cite
Text
Tseng et al. "EDGE: Editable Dance Generation from Music." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00051Markdown
[Tseng et al. "EDGE: Editable Dance Generation from Music." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/tseng2023cvpr-edge/) doi:10.1109/CVPR52729.2023.00051BibTeX
@inproceedings{tseng2023cvpr-edge,
title = {{EDGE: Editable Dance Generation from Music}},
author = {Tseng, Jonathan and Castellon, Rodrigo and Liu, Karen},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2023},
pages = {448-458},
doi = {10.1109/CVPR52729.2023.00051},
url = {https://mlanthology.org/cvpr/2023/tseng2023cvpr-edge/}
}