Infant Action Generative Modeling

Abstract

Despite advancements in human motion generation models their performance drops in infant motion generation due to limited data available and lack of 3D skeleton ground truth. To address this we introduce the infant action generation and classification (InfAGenC) pipeline which combines a transformer-based variational autoencoder (VAE) with a spatial-temporal graph convolutional network (STGCN) to create synthetic infant action samples. By iterative refinement of the generative model with diverse and accurate data we improve the realism of synthetic data leading to more precise infant action recognition models. Our results show significant improvements in action recognition performance on real-world data demonstrating that synthetic data can enhance small training datasets and advance infant action recognition. Our pipeline increases action recognition accuracy up to 88.58% on the infant action dataset and up to 98% on an adult action dataset.

Cite

Text

Huang et al. "Infant Action Generative Modeling." Winter Conference on Applications of Computer Vision, 2025.

Markdown

[Huang et al. "Infant Action Generative Modeling." Winter Conference on Applications of Computer Vision, 2025.](https://mlanthology.org/wacv/2025/huang2025wacv-infant/)

BibTeX

@inproceedings{huang2025wacv-infant,
  title     = {{Infant Action Generative Modeling}},
  author    = {Huang, Xiaofei and Hatamimajoumerd, Elaheh and Mathew, Amal and Ostadabbas, Sarah},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2025},
  pages     = {253-265},
  url       = {https://mlanthology.org/wacv/2025/huang2025wacv-infant/}
}