Audiovisual Event Detection Towards Scene Understanding

Abstract

Acoustic events produced in meeting environments may contain useful information for perceptually aware interfaces and multimodal behavior analysis. In this paper, a system to detect and recognize these events from a multimodal perspective is presented combining information from multiple cameras and microphones. First, spectral and temporal features are extracted from a single audio channel and spatial localization is achieved by exploiting cross-correlation among microphone arrays. Second, several video cues obtained from multiperson tracking, motion analysis, face recognition, and object detection provide the visual counterpart of the acoustic events to be detected. A multimodal data fusion at score level is carried out using two approaches: weighted mean average and fuzzy integral. Finally, a multimodal database containing a rich variety of acoustic events has been recorded including manual annotations of the data. A set of metrics allow assessing the performance of the presented algorithms. This dataset is made publicly available for research purposes.

Cite

Text

Canton-Ferrer et al. "Audiovisual Event Detection Towards Scene Understanding." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2009. doi:10.1109/CVPRW.2009.5204264

Markdown

[Canton-Ferrer et al. "Audiovisual Event Detection Towards Scene Understanding." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2009.](https://mlanthology.org/cvprw/2009/cantonferrer2009cvprw-audiovisual/) doi:10.1109/CVPRW.2009.5204264

BibTeX

@inproceedings{cantonferrer2009cvprw-audiovisual,
  title     = {{Audiovisual Event Detection Towards Scene Understanding}},
  author    = {Canton-Ferrer, Cristian and Butko, Taras and Segura, Carlos and Giró, Xavier and Nadeu, Climent and Hernando, Javier and Casas, Josep R.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2009},
  pages     = {81-88},
  doi       = {10.1109/CVPRW.2009.5204264},
  url       = {https://mlanthology.org/cvprw/2009/cantonferrer2009cvprw-audiovisual/}
}