A Bayesian Framework for Fusing Multiple Word Knowledge Models in Videotext Recognition

Zhang, DongQing; Chang, Shih-Fu

doi:10.1109/CVPR.2003.1211512

A Bayesian Framework for Fusing Multiple Word Knowledge Models in Videotext Recognition

DongQing Zhang, Shih-Fu Chang

CVPR 2003 pp. 528-533

doi:10.1109/CVPR.2003.1211512 /cvpr/2003/zhang2003cvpr-bayesian/

Abstract

Videotext recognition is challenging due to low resolution, diverse fonts/styles, and cluttered background. Past methods enhanced recognition by using multiple frame averaging, image interpolation and lexicon correction, but recognition using multi-modality language models has not been explored. In this paper, we present a formal Bayesian framework for videotext recognition by combining multiple knowledge using mixture models, and describe a learning approach based on Expectation-Maximization (EM). In order to handle unseen words, a back-off smoothing approach derived from the Bayesian model is also presented. We exploited a prototype that fuses the model from closed caption and that from the British National Corpus. The model from closed caption is based on a unique time distance distribution model of videotext words and closed caption words. Our method achieves a significant performance gain, with word recognition rate of 76.8% and character recognition rate of 86.7%. The proposed methods also reduce false videotext detection significantly, with a false alarm rate of 8.2% without substantial loss of recall.

CVPR Semantic Scholar

Cite

Text

Zhang and Chang. "A Bayesian Framework for Fusing Multiple Word Knowledge Models in Videotext Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2003. doi:10.1109/CVPR.2003.1211512

Markdown

[Zhang and Chang. "A Bayesian Framework for Fusing Multiple Word Knowledge Models in Videotext Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2003.](https://mlanthology.org/cvpr/2003/zhang2003cvpr-bayesian/) doi:10.1109/CVPR.2003.1211512

BibTeX

@inproceedings{zhang2003cvpr-bayesian,
  title     = {{A Bayesian Framework for Fusing Multiple Word Knowledge Models in Videotext Recognition}},
  author    = {Zhang, DongQing and Chang, Shih-Fu},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2003},
  pages     = {528-533},
  doi       = {10.1109/CVPR.2003.1211512},
  url       = {https://mlanthology.org/cvpr/2003/zhang2003cvpr-bayesian/}
}