Multimodal Tracking for Smart Videoconferencing and Video Surveillance

Abstract

Many applications require the ability to track the 3-D motion of the subjects. We build a particle filter based framework for multimodal tracking using multiple cameras and multiple microphone arrays. In order to calibrate the resulting system, we propose a method to determine the locations of all microphones using at least five loudspeakers and under assumption that for each loudspeaker there exists a microphone very close to it. We derive the maximum likelihood (ML) estimator, which reduces to the solution of the non-linear least squares problem. We verify the correctness and robustness of the multimodal tracker and of the self-calibration algorithm both with Monte-Carlo simulations and on real data from three experimental setups.

Cite

Text

Zotkin et al. "Multimodal Tracking for Smart Videoconferencing and Video Surveillance." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2007. doi:10.1109/CVPR.2007.383525

Markdown

[Zotkin et al. "Multimodal Tracking for Smart Videoconferencing and Video Surveillance." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2007.](https://mlanthology.org/cvpr/2007/zotkin2007cvpr-multimodal/) doi:10.1109/CVPR.2007.383525

BibTeX

@inproceedings{zotkin2007cvpr-multimodal,
  title     = {{Multimodal Tracking for Smart Videoconferencing and Video Surveillance}},
  author    = {Zotkin, Dmitry N. and Raykar, Vikas C. and Duraiswami, Ramani and Davis, Larry S.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2007},
  doi       = {10.1109/CVPR.2007.383525},
  url       = {https://mlanthology.org/cvpr/2007/zotkin2007cvpr-multimodal/}
}