Length-Aware Motion Synthesis via Latent Diffusion

Abstract

The target duration of a synthesized human motion is a critical attribute that requires modeling control over the motion dynamics and style. Speeding up an action performance is not merely fast-forwarding it. However, state-of-the-art techniques for human behavior synthesis have limited control over the target sequence length. We introduce the problem of generating length-aware 3D human motion sequences from textual descriptors, and we propose a novel model to synthesize motions of variable target lengths, which we dub “Length-Aware Latent Diffusion” (LADiff ). LADiff consists of two new modules: 1) a length-aware variational auto-encoder to learn motion representations with length-dependent latent codes; 2) a length-conforming latent diffusion model to generate motions with a richness of details that increases with the required target sequence length. LADiff significantly improves over the state-of-the-art across most of the existing motion synthesis metrics on the two established benchmarks of HumanML3D and KIT-ML. The code is available at https://github.com/AlessioSam/LADiff.

Cite

Text

Sampieri et al. "Length-Aware Motion Synthesis via Latent Diffusion." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73668-1_7

Markdown

[Sampieri et al. "Length-Aware Motion Synthesis via Latent Diffusion." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/sampieri2024eccv-lengthaware/) doi:10.1007/978-3-031-73668-1_7

BibTeX

@inproceedings{sampieri2024eccv-lengthaware,
  title     = {{Length-Aware Motion Synthesis via Latent Diffusion}},
  author    = {Sampieri, Alessio and Palma, Alessio and Spinelli, Indro and Galasso, Fabio},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73668-1_7},
  url       = {https://mlanthology.org/eccv/2024/sampieri2024eccv-lengthaware/}
}