Motion Mamba: Efficient and Long Sequence Motion Generation

Abstract

Human motion generation stands as a significant pursuit in generative computer vision, while achieving long-sequence and efficient motion generation remains challenging. Recent advancements in state space models (SSMs), notably Mamba, have showcased considerable promise in long sequence modeling with an efficient hardware-aware design, which appears to be a significant direction upon building motion generation model. Nevertheless, adapting SSMs to motion generation faces hurdles since the lack of a specialized design architecture to model motion sequence. To address these challenges, we propose Motion Mamba, a simple yet efficient approach that presents the pioneering motion generation model utilized SSMs. Specifically, we design a Hierarchical Temporal Mamba (HTM) block to process temporal data by ensemble varying numbers of isolated SSM modules across a symmetric U-Net architecture aimed at preserving motion consistency between frames. We also design a Bidirectional Spatial Mamba (BSM) block to bidirectionally process latent poses, to enhance accurate motion generation within a temporal frame. Our proposed method achieves up to 50% FID improvement and up to 4 times faster on the HumanML3D and KIT-ML datasets compared to the previous best diffusion-based method, which demonstrates strong capabilities of high-quality long sequence motion modeling and real-time human motion generation.

Cite

Text

Zhang et al. "Motion Mamba: Efficient and Long Sequence Motion Generation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73232-4_15

Markdown

[Zhang et al. "Motion Mamba: Efficient and Long Sequence Motion Generation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/zhang2024eccv-motion/) doi:10.1007/978-3-031-73232-4_15

BibTeX

@inproceedings{zhang2024eccv-motion,
  title     = {{Motion Mamba: Efficient and Long Sequence Motion Generation}},
  author    = {Zhang, Zeyu and Liu, Akide and Reid, Ian and Hartley, Richard and Zhuang, Bohan and Tang, Hao},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73232-4_15},
  url       = {https://mlanthology.org/eccv/2024/zhang2024eccv-motion/}
}