Moving Humans Detection Based on Multi-Modal Sensor Fusion
Abstract
Moving object detection plays an important role in automated surveillance systems. However, it is challenging to detect moving objects robustly in a cluttered environment. In this paper, we propose an approach for detecting humans using multi-modal measurements. The approach is based on using Time-Delay Neural Network (TDNN) to fuse the audio and video data at the feature level for detecting the walker with multiple persons in the scene. The main contribution of this paper is the introduction of Time-Delay Neural Network in learning the relation between visual motion and step sounds of the walking person. Experimental results are presented.
Cite
Text
Bhanu and Zou. "Moving Humans Detection Based on Multi-Modal Sensor Fusion." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2004. doi:10.1109/CVPR.2004.403Markdown
[Bhanu and Zou. "Moving Humans Detection Based on Multi-Modal Sensor Fusion." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2004.](https://mlanthology.org/cvpr/2004/bhanu2004cvpr-moving/) doi:10.1109/CVPR.2004.403BibTeX
@inproceedings{bhanu2004cvpr-moving,
title = {{Moving Humans Detection Based on Multi-Modal Sensor Fusion}},
author = {Bhanu, Bir and Zou, Xiaotao},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2004},
pages = {136},
doi = {10.1109/CVPR.2004.403},
url = {https://mlanthology.org/cvpr/2004/bhanu2004cvpr-moving/}
}