Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors

Abstract

In this paper, we present a new approach for text localization in natural images, by discriminating text and non-text regions at three levels: pixel, component and textline levels. Firstly, a powerful low-level filter called the Stroke Feature Transform (SFT) is proposed, which extends the widely-used Stroke Width Transform (SWT) by incorporating color cues of text pixels, leading to significantly enhanced performance on inter-component separation and intra-component connection. Secondly, based on the output of SFT, we apply two classifiers, a text component classifier and a text-line classifier, sequentially to extract text regions, eliminating the heuristic procedures that are commonly used in previous approaches. The two classifiers are built upon two novel Text Covariance Descriptors (TCDs) that encode both the heuristic properties and the statistical characteristics of text stokes. Finally, text regions are located by simply thresholding the text-line confident map. Our method was evaluated on two benchmark datasets: ICDAR 2005 and ICDAR 2011, and the corresponding Fmeasure values are 0.72 and 0.73, respectively, surpassing previous methods in accuracy by a large margin.

Cite

Text

Huang et al. "Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors." International Conference on Computer Vision, 2013. doi:10.1109/ICCV.2013.157

Markdown

[Huang et al. "Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors." International Conference on Computer Vision, 2013.](https://mlanthology.org/iccv/2013/huang2013iccv-text/) doi:10.1109/ICCV.2013.157

BibTeX

@inproceedings{huang2013iccv-text,
  title     = {{Text Localization in Natural Images Using Stroke Feature Transform and Text Covariance Descriptors}},
  author    = {Huang, Weilin and Lin, Zhe and Yang, Jianchao and Wang, Jue},
  booktitle = {International Conference on Computer Vision},
  year      = {2013},
  doi       = {10.1109/ICCV.2013.157},
  url       = {https://mlanthology.org/iccv/2013/huang2013iccv-text/}
}