Using Language-Aligned Gesture Embeddings for Understanding Gestures Accompanying Math Terms
Abstract
In this paper, we introduce an approach for recognizing and classifying gestures that accompany mathematical terms, in a new collection we name the "GAMT" dataset. Our method uses language as a means of providing context to classify gestures. Specifically, we use a CLIP-style framework to construct a shared embedding space for gestures and language, experimenting with various methods for encoding gestures within this space. We evaluate our method on our new dataset containing a wide array of gestures associated with mathematical terms. The shared embedding space leads to a substantial improvement in gesture classification. Furthermore, we identify an efficient model that excelled at classifying gestures from our unique dataset, thus contributing to the further development of gesture recognition in diverse interaction scenarios.
Cite
Text
Maidment et al. "Using Language-Aligned Gesture Embeddings for Understanding Gestures Accompanying Math Terms." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00228Markdown
[Maidment et al. "Using Language-Aligned Gesture Embeddings for Understanding Gestures Accompanying Math Terms." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/maidment2024cvprw-using/) doi:10.1109/CVPRW63382.2024.00228BibTeX
@inproceedings{maidment2024cvprw-using,
title = {{Using Language-Aligned Gesture Embeddings for Understanding Gestures Accompanying Math Terms}},
author = {Maidment, Tristan and Patel, Purav J. and Walker, Erin and Kovashka, Adriana},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2024},
pages = {2227-2237},
doi = {10.1109/CVPRW63382.2024.00228},
url = {https://mlanthology.org/cvprw/2024/maidment2024cvprw-using/}
}