Recognizing Action at a Distance

Abstract

Our goal is to recognize human action at a distance, at resolutions where a whole person may be, say, 30 pixels tall. We introduce a novel motion descriptor based on optical flow measurements in a spatiotemporal volume for each stabilized human figure, and an associated similarity measure to be used in a nearest-neighbor framework. Making use of noisy optical flow measurements is the key challenge, which is addressed by treating optical flow not as precise pixel displacements, but rather as a spatial pattern of noisy measurements which are carefully smoothed and aggregated to form our spatiotemporal motion descriptor. To classify the action being performed by a human figure in a query sequence, we retrieve nearest neighbor(s) from a database of stored, annotated video sequences. We can also use these retrieved exemplars to transfer 2D/3D skeletons onto the figures in the query sequence, as well as two forms of data-based action synthesis "do as I do" and "do as I say". Results are demonstrated on ballet, tennis as well as football datasets.

Cite

Text

Efros et al. "Recognizing Action at a Distance." IEEE/CVF International Conference on Computer Vision, 2003. doi:10.1109/ICCV.2003.1238420

Markdown

[Efros et al. "Recognizing Action at a Distance." IEEE/CVF International Conference on Computer Vision, 2003.](https://mlanthology.org/iccv/2003/efros2003iccv-recognizing/) doi:10.1109/ICCV.2003.1238420

BibTeX

@inproceedings{efros2003iccv-recognizing,
  title     = {{Recognizing Action at a Distance}},
  author    = {Efros, Alexei A. and Berg, Alexander C. and Mori, Greg and Malik, Jitendra},
  booktitle = {IEEE/CVF International Conference on Computer Vision},
  year      = {2003},
  pages     = {726-733},
  doi       = {10.1109/ICCV.2003.1238420},
  url       = {https://mlanthology.org/iccv/2003/efros2003iccv-recognizing/}
}