Active Speakers in Context

Alcazar, Juan Leon; Caba, Fabian; Mai, Long; Perazzi, Federico; Lee, Joon-Young; Arbelaez, Pablo; Ghanem, Bernard

doi:10.1109/CVPR42600.2020.01248

Active Speakers in Context

Juan Leon Alcazar, Fabian Caba, Long Mai, Federico Perazzi, Joon-Young Lee, Pablo Arbelaez, Bernard Ghanem

CVPR 2020

doi:10.1109/CVPR42600.2020.01248 /cvpr/2020/alcazar2020cvpr-active/

Abstract

Current methods for active speaker detection focus on modeling audiovisual information from a single speaker. This strategy can be adequate for addressing single-speaker scenarios, but it prevents accurate detection when the task is to identify who of many candidate speakers are talking. This paper introduces the Active Speaker Context, a novel representation that models relationships between multiple speakers over long time horizons. Our new model learns pairwise and temporal relations from a structured ensemble of audiovisual observations. Our experiments show that a structured feature ensemble already benefits active speaker detection performance. We also find that the proposed Active Speaker Context improves the state-of-the-art on the AVA-ActiveSpeaker dataset achieving an mAP of 87.1%. Moreover, ablation studies verify that this result is a direct consequence of our long-term multi-speaker analysis.

PDF CVPR Semantic Scholar

Cite

Text

Alcazar et al. "Active Speakers in Context." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.01248

Markdown

[Alcazar et al. "Active Speakers in Context." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/alcazar2020cvpr-active/) doi:10.1109/CVPR42600.2020.01248

BibTeX

@inproceedings{alcazar2020cvpr-active,
  title     = {{Active Speakers in Context}},
  author    = {Alcazar, Juan Leon and Caba, Fabian and Mai, Long and Perazzi, Federico and Lee, Joon-Young and Arbelaez, Pablo and Ghanem, Bernard},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.01248},
  url       = {https://mlanthology.org/cvpr/2020/alcazar2020cvpr-active/}
}