Multi-Modal Person Identification in a Smart Environment

Abstract

In this paper, we present a detailed analysis of multimodal fusion for person identification in a smart environment. The multi-modal system consists of a video-based face recognition system and a speaker identification system. We investigated different score normalization, modality weighting and modality combination schemes during the fusion of the individual modalities. We introduced two new modality weighting schemes, namely, the cumulative ratio of correct matches (CRCM) and distance-to-second-closest (DT2ND) measures. In addition, we also assessed the effects of the well-known score normalization and classifier combination methods on the identification performance. Experimental results obtained on the CLEAR 2007 evaluation corpus, which contains audio-visual recordings from different smart rooms, show that CRCM-based modality weighting improves the correct identification rates significantly.

Cite

Text

Ekenel et al. "Multi-Modal Person Identification in a Smart Environment." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2007. doi:10.1109/CVPR.2007.383388

Markdown

[Ekenel et al. "Multi-Modal Person Identification in a Smart Environment." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2007.](https://mlanthology.org/cvpr/2007/ekenel2007cvpr-multi/) doi:10.1109/CVPR.2007.383388

BibTeX

@inproceedings{ekenel2007cvpr-multi,
  title     = {{Multi-Modal Person Identification in a Smart Environment}},
  author    = {Ekenel, Hazim Kemal and Fischer, Mika and Jin, Qin and Stiefelhagen, Rainer},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2007},
  doi       = {10.1109/CVPR.2007.383388},
  url       = {https://mlanthology.org/cvpr/2007/ekenel2007cvpr-multi/}
}