A Multimodality Framework for Creating Speaker/Non-Speaker Profile Databases for Real-World Video

Abstract

We propose a complete solution to full modality person-profiling for speakers and submodality person-profiling for non-speakers in real-world videos. This is a step towards building an elaborate database efface, name and voice correspondence for speakers appearing in the news videos. In addition we are also interested in only name and face correspondence database for non-speakers who appear during voice-overs. We use an unsupervised technique for creating a speaker identification database and a unique primary feature matching and parallel line matching algorithm for creating a non-speaker identification database. We tested our approach on real world data and the results show good performance for news videos. It can be incorporated as part of a larger multimedia news video analysis system or a multimedia search system for efficient news video retrieval and browsing.

Cite

Text

Abbas et al. "A Multimodality Framework for Creating Speaker/Non-Speaker Profile Databases for Real-World Video." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2007. doi:10.1109/CVPR.2007.383493

Markdown

[Abbas et al. "A Multimodality Framework for Creating Speaker/Non-Speaker Profile Databases for Real-World Video." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2007.](https://mlanthology.org/cvpr/2007/abbas2007cvpr-multimodality/) doi:10.1109/CVPR.2007.383493

BibTeX

@inproceedings{abbas2007cvpr-multimodality,
  title     = {{A Multimodality Framework for Creating Speaker/Non-Speaker Profile Databases for Real-World Video}},
  author    = {Abbas, Jehanzeb and Dagli, Charlie K. and Huang, Thomas S.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2007},
  doi       = {10.1109/CVPR.2007.383493},
  url       = {https://mlanthology.org/cvpr/2007/abbas2007cvpr-multimodality/}
}