Hands by Hand: Crowd-Sourced Motion Tracking for Gesture Annotation
Abstract
We describe a method for using crowd-sourced labor to track motion and ultimately annotate gestures of humans in video. Our chosen platform for deployment, Amazon Mechanical Turk, divides labor into HITs (Human Intelligence Tasks). Given the informational density of video, our task is potentially larger than a traditional HIT that involves processing a block of text or a single image. We exploit redundancies in video data in such a way that workers' efforts can be multiplied in effect. In the end, a fraction of frames need to be annotated by hand, but we can still achieve complete coverage of all video frames. This is achieved with a combination of HITs using a novel user interface, combined with automatic techniques such as template tracking and affinity propagation clustering. We show in a case study how we can annotate a video database of political speeches with 2D positions and 3D hand pose configurations. This data is then used for some preliminary analytical tasks.
Cite
Text
Spiro et al. "Hands by Hand: Crowd-Sourced Motion Tracking for Gesture Annotation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2010. doi:10.1109/CVPRW.2010.5543191Markdown
[Spiro et al. "Hands by Hand: Crowd-Sourced Motion Tracking for Gesture Annotation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2010.](https://mlanthology.org/cvprw/2010/spiro2010cvprw-hands/) doi:10.1109/CVPRW.2010.5543191BibTeX
@inproceedings{spiro2010cvprw-hands,
title = {{Hands by Hand: Crowd-Sourced Motion Tracking for Gesture Annotation}},
author = {Spiro, Ian and Taylor, Geoffrey and Williams, George and Bregler, Christoph},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2010},
pages = {17-24},
doi = {10.1109/CVPRW.2010.5543191},
url = {https://mlanthology.org/cvprw/2010/spiro2010cvprw-hands/}
}