EDGE: Editable Dance Generation from Music

Abstract

Dance is an important human art form, but creating new dances can be difficult and time-consuming. In this work, we introduce Editable Dance GEneration (EDGE), a state-of-the-art method for editable dance generation that is capable of creating realistic, physically-plausible dances while remaining faithful to the input music. EDGE uses a transformer-based diffusion model paired with Jukebox, a strong music feature extractor, and confers powerful editing capabilities well-suited to dance, including joint-wise conditioning, and in-betweening. We introduce a new metric for physical plausibility, and evaluate dance quality generated by our method extensively through (1) multiple quantitative metrics on physical plausibility, alignment, and diversity benchmarks, and more importantly, (2) a large-scale user study, demonstrating a significant improvement over previous state-of-the-art methods. Qualitative samples from our model can be found at our website.

Cite

Text

Tseng et al. "EDGE: Editable Dance Generation from Music." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00051

Markdown

[Tseng et al. "EDGE: Editable Dance Generation from Music." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/tseng2023cvpr-edge/) doi:10.1109/CVPR52729.2023.00051

BibTeX

@inproceedings{tseng2023cvpr-edge,
  title     = {{EDGE: Editable Dance Generation from Music}},
  author    = {Tseng, Jonathan and Castellon, Rodrigo and Liu, Karen},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {448-458},
  doi       = {10.1109/CVPR52729.2023.00051},
  url       = {https://mlanthology.org/cvpr/2023/tseng2023cvpr-edge/}
}