Background-Insensitive Scene Text Recognition with Text Semantic Segmentation

Abstract

Scene Text Recognition (STR) has many important applications in computer vision. Complex backgrounds continue to be a big challenge for STR because they interfere with text feature extraction. Many existing methods use attentional regions, bounding boxes or polygons to reduce such interference. However, the text regions located by these methods still contain much undesirable background interference. In this paper, we propose a Background-Insensitive approach BINet by explicitly leveraging the text Semantic Segmentation (SSN) to extract texts more accurately. SSN is trained on a set of existing segmentation data, whose volume is only 0.03% of STR training data. This prevents the large-scale pixel-level annotations of the STR training data. To effectively utilize the segmentation cues, we design new segmentation refinement and embedding blocks for refining text-masks and reinforcing visual features. Additionally, we propose an efficient pipeline that utilizes Synthetic Initialization (SI) for STR models trained only on real data (1.7% of STR training data), instead of on both synthetic and real data from scratch. Experiments show that the proposed method can recognize text from complex backgrounds more effectively, achieving state-of-the-art performance on several public datasets.

Cite

Text

Zhao et al. "Background-Insensitive Scene Text Recognition with Text Semantic Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19806-9_10

Markdown

[Zhao et al. "Background-Insensitive Scene Text Recognition with Text Semantic Segmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/zhao2022eccv-backgroundinsensitive/) doi:10.1007/978-3-031-19806-9_10

BibTeX

@inproceedings{zhao2022eccv-backgroundinsensitive,
  title     = {{Background-Insensitive Scene Text Recognition with Text Semantic Segmentation}},
  author    = {Zhao, Liang and Wu, Zhenyao and Wu, Xinyi and Wilsbacher, Greg and Wang, Song},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19806-9_10},
  url       = {https://mlanthology.org/eccv/2022/zhao2022eccv-backgroundinsensitive/}
}