Unsupervised Motion Representation Learning with Capsule Autoencoders

Abstract

We propose the Motion Capsule Autoencoder (MCAE), which addresses a key challenge in the unsupervised learning of motion representations: transformation invariance. MCAE models motion in a two-level hierarchy. In the lower level, a spatio-temporal motion signal is divided into short, local, and semantic-agnostic snippets. In the higher level, the snippets are aggregated to form full-length semantic-aware segments. For both levels, we represent motion with a set of learned transformation invariant templates and the corresponding geometric transformations by using capsule autoencoders of a novel design. This leads to a robust and efficient encoding of viewpoint changes. MCAE is evaluated on a novel Trajectory20 motion dataset and various real-world skeleton-based human action datasets. Notably, it achieves better results than baselines on Trajectory20 with considerably fewer parameters and state-of-the-art performance on the unsupervised skeleton-based action recognition task.

Cite

Text

Xu et al. "Unsupervised Motion Representation Learning with Capsule Autoencoders." Neural Information Processing Systems, 2021.

Markdown

[Xu et al. "Unsupervised Motion Representation Learning with Capsule Autoencoders." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/xu2021neurips-unsupervised/)

BibTeX

@inproceedings{xu2021neurips-unsupervised,
  title     = {{Unsupervised Motion Representation Learning with Capsule Autoencoders}},
  author    = {Xu, Ziwei and Shen, Xudong and Wong, Yongkang and Kankanhalli, Mohan S},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/xu2021neurips-unsupervised/}
}