Person Tracking Using Audio and Depth Cues

Abstract

In this paper, a novel probabilistic Bayesian tracking scheme is proposed and applied to bimodal measurements consisting of tracking results from the depth sensor and audio recordings collected using binaural microphones. We use random finite sets to cope with varying number of tracking targets. A measurement-driven birth process is integrated to quickly localize any emerging person. A new bimodal fusion method that prioritizes the most confident modality is employed. The approach was tested on real room recordings and experimental results show that the proposed combination of audio and depth outperforms individual modalities, particularly when there are multiple people talking simultaneously and when occlusions are frequent.

Cite

Text

Liu et al. "Person Tracking Using Audio and Depth Cues." IEEE/CVF International Conference on Computer Vision Workshops, 2015. doi:10.1109/ICCVW.2015.97

Markdown

[Liu et al. "Person Tracking Using Audio and Depth Cues." IEEE/CVF International Conference on Computer Vision Workshops, 2015.](https://mlanthology.org/iccvw/2015/liu2015iccvw-person/) doi:10.1109/ICCVW.2015.97

BibTeX

@inproceedings{liu2015iccvw-person,
  title     = {{Person Tracking Using Audio and Depth Cues}},
  author    = {Liu, Qingju and de Campos, Teofilo and Wang, Wenwu and Jackson, Philip J. B. and Hilton, Adrian},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2015},
  pages     = {709-717},
  doi       = {10.1109/ICCVW.2015.97},
  url       = {https://mlanthology.org/iccvw/2015/liu2015iccvw-person/}
}