Action Detection in Complex Scenes with Spatial and Temporal Ambiguities
Abstract
In this paper, we investigate the detection of seman-tic human actions in complex scenes. Unlike conven-tional action recognition in well-controlled environments, action detection in complex scenes suffers from cluttered backgrounds, heavy crowds, occluded bodies, and spatial-temporal boundary ambiguities caused by imperfect hu-man detection and tracking. Conventional algorithms are likely to fail with such spatial-temporal ambiguities. In this work, the candidate regions of an action are treated as a bag of instances. Then a novel multiple-instance learning framework, named SMILE-SVM (Simulated annealing Mul-tiple Instance LEarning Support Vector Machines), is pre-sented for learning human action detector based on impre-cise action locations. SMILE-SVM is extensively evaluated with satisfactory performances on two tasks: 1) human ac-tion detection on a public video action database with clut-tered backgrounds, and 2) a real world problem of detecting whether the customers in a shopping mall show an intention to purchase the merchandise on shelf (even if they didn’t buy it eventually). In addition, the complementary nature of motion and appearance features in action detection are also validated, demonstrating a boosted performance in our experiments. 1.
Cite
Text
Hu et al. "Action Detection in Complex Scenes with Spatial and Temporal Ambiguities." IEEE/CVF International Conference on Computer Vision, 2009. doi:10.1109/ICCV.2009.5459153Markdown
[Hu et al. "Action Detection in Complex Scenes with Spatial and Temporal Ambiguities." IEEE/CVF International Conference on Computer Vision, 2009.](https://mlanthology.org/iccv/2009/hu2009iccv-action/) doi:10.1109/ICCV.2009.5459153BibTeX
@inproceedings{hu2009iccv-action,
title = {{Action Detection in Complex Scenes with Spatial and Temporal Ambiguities}},
author = {Hu, Yuxiao and Cao, Liangliang and Lv, Fengjun and Yan, Shuicheng and Gong, Yihong and Huang, Thomas S.},
booktitle = {IEEE/CVF International Conference on Computer Vision},
year = {2009},
pages = {128-135},
doi = {10.1109/ICCV.2009.5459153},
url = {https://mlanthology.org/iccv/2009/hu2009iccv-action/}
}