Audio-Visual Affect Recognition Through Multi-Stream Fused HMM for HCI

Abstract

Advances in computer processing power and emerging algorithms are allowing new ways of envisioning Human Computer Interaction. This paper focuses on the development of a computing algorithm that uses audio and visual sensors to detect and track a user's affective state to aid computer decision making. Using our Multi-stream Fused Hidden Markov Model (MFHMM), we analyzed coupled audio and visual streams to detect 11 cognitive/emotive states. The MFHMM allows the building of an optimal connection among multiple streams according to the maximum entropy principle and the maximum mutual information criterion. Person-independent experimental results from 20 subjects in 660 sequences show that the MFHMM approach performs with an accuracy of 80.61% which outperforms face-only HMM, pitch-only HMM, energy-only HMM, and independent HMM fusion.

Cite

Text

Zeng et al. "Audio-Visual Affect Recognition Through Multi-Stream Fused HMM for HCI." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2005. doi:10.1109/CVPR.2005.77

Markdown

[Zeng et al. "Audio-Visual Affect Recognition Through Multi-Stream Fused HMM for HCI." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2005.](https://mlanthology.org/cvpr/2005/zeng2005cvpr-audio/) doi:10.1109/CVPR.2005.77

BibTeX

@inproceedings{zeng2005cvpr-audio,
  title     = {{Audio-Visual Affect Recognition Through Multi-Stream Fused HMM for HCI}},
  author    = {Zeng, Zhihong and Tu, Jilin and Pianfetti, Brian and Liu, Ming and Zhang, Tong and Zhang, ZhenQiu and Huang, Thomas S. and Levinson, Stephen E.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2005},
  pages     = {967-972},
  doi       = {10.1109/CVPR.2005.77},
  url       = {https://mlanthology.org/cvpr/2005/zeng2005cvpr-audio/}
}