MOSAIC: Skeleton-Based Human Motion Recognition with Compositional Representations
Abstract
Motion recognition holds significant importance across various domains of application nowadays. Deep architectures represent the gold standard, with astonishing results, at the price, in general, of very high model complexity and poor interpretability. Drawing inspiration from the compositional nature of human motion, in this work we investigate the use of a modular architecture, that includes a VAE for the unsupervised learning of a compositional and minimal action representation, followed by a Transformer to classify the action sentence . We assess our approach on the BABEL dataset, comparing various positional and kinematic features in input. Our results demonstrate that despite the simplicity of the representation, our model provides a good trade-off between effectiveness, efficiency and interpretability. These insights pave the way for employing this methodology in diverse tasks, including motion generation.
Cite
Text
Tomenotti and Noceti. "MOSAIC: Skeleton-Based Human Motion Recognition with Compositional Representations." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91578-9_23Markdown
[Tomenotti and Noceti. "MOSAIC: Skeleton-Based Human Motion Recognition with Compositional Representations." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/tomenotti2024eccvw-mosaic/) doi:10.1007/978-3-031-91578-9_23BibTeX
@inproceedings{tomenotti2024eccvw-mosaic,
title = {{MOSAIC: Skeleton-Based Human Motion Recognition with Compositional Representations}},
author = {Tomenotti, Federico Figari and Noceti, Nicoletta},
booktitle = {European Conference on Computer Vision Workshops},
year = {2024},
pages = {299-309},
doi = {10.1007/978-3-031-91578-9_23},
url = {https://mlanthology.org/eccvw/2024/tomenotti2024eccvw-mosaic/}
}