PoseGPT: Quantization-Based 3D Human Motion Generation and Forecasting

Abstract

We address the problem of action-conditioned generation of human motion sequences. Existing work falls into two categories: forecast models conditioned on observed past motions, or generative models conditioned action labels and duration only. In contrast, we generate motion conditioned on observations of arbitrary length, including none. To solve this generalized problem, we propose PoseGPT, an auto-regressive transformer-based approach which internally compresses human motion into quantized latent sequences. An auto-encoder first maps human motion to latent index sequences in a discrete space, and vice-versa. Inspired by the Generative Pretrained Transformer (GPT), we propose to train a GPT-like model for next-index prediction in that space; this allows PoseGPT to output distributions on possible futures, with or without conditioning on past motion. The discrete and compressed nature of the latent space allows the GPT- like model to focus on long-range signal, as it removes low-level redundancy in the input signal. Predicting discrete indices also alleviates the common pitfall of predicting averaged poses, a typical failure case when regressing continuous values, as the average of discrete targets is not a target itself. Our experimental results show that our proposed approach achieves state-of-the-art results on Hu- manAct12 - a standard but small scale dataset, on BABEL - a recent large scale MoCap dataset and on GRAB - a human-object interactions dataset.

Cite

Text

Lucas et al. "PoseGPT: Quantization-Based 3D Human Motion Generation and Forecasting." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-20068-7_24

Markdown

[Lucas et al. "PoseGPT: Quantization-Based 3D Human Motion Generation and Forecasting." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/lucas2022eccv-posegpt/) doi:10.1007/978-3-031-20068-7_24

BibTeX

@inproceedings{lucas2022eccv-posegpt,
  title     = {{PoseGPT: Quantization-Based 3D Human Motion Generation and Forecasting}},
  author    = {Lucas, Thomas and Baradel, Fabien and Weinzaepfel, Philippe and Rogez, Grégory},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-20068-7_24},
  url       = {https://mlanthology.org/eccv/2022/lucas2022eccv-posegpt/}
}