Moving Humans Detection Based on Multi-Modal Sensor Fusion

Abstract

Moving object detection plays an important role in automated surveillance systems. However, it is challenging to detect moving objects robustly in a cluttered environment. In this paper, we propose an approach for detecting humans using multi-modal measurements. The approach is based on using Time-Delay Neural Network (TDNN) to fuse the audio and video data at the feature level for detecting the walker with multiple persons in the scene. The main contribution of this paper is the introduction of Time-Delay Neural Network in learning the relation between visual motion and step sounds of the walking person. Experimental results are presented.

Cite

Text

Bhanu and Zou. "Moving Humans Detection Based on Multi-Modal Sensor Fusion." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2004. doi:10.1109/CVPR.2004.403

Markdown

[Bhanu and Zou. "Moving Humans Detection Based on Multi-Modal Sensor Fusion." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2004.](https://mlanthology.org/cvprw/2004/bhanu2004cvprw-moving/) doi:10.1109/CVPR.2004.403

BibTeX

@inproceedings{bhanu2004cvprw-moving,
  title     = {{Moving Humans Detection Based on Multi-Modal Sensor Fusion}},
  author    = {Bhanu, Bir and Zou, Xiaotao},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2004},
  pages     = {136},
  doi       = {10.1109/CVPR.2004.403},
  url       = {https://mlanthology.org/cvprw/2004/bhanu2004cvprw-moving/}
}