SEE: Towards Semi-Supervised End-to-End Scene Text Recognition

Abstract

Detecting and recognizing text in natural scene images is a challenging, yet not completely solved task. In recent years several new systems that try to solve at least one of the two sub-tasks (text detection and text recognition) have been proposed. In this paper we present SEE, a step towards semi-supervised neural networks for scene text detection and recognition, that can be optimized end-to-end. Most existing works consist of multiple deep neural networks and several pre-processing steps. In contrast to this, we propose to use a single deep neural network, that learns to detect and recognize text from natural images, in a semi-supervised way. SEE is a network that integrates and jointly learns a spatial transformer network, which can learn to detect text regions in an image, and a text recognition network that takes the identified text regions and recognizes their textual content. We introduce the idea behind our novel approach and show its feasibility, by performing a range of experiments on standard benchmark datasets, where we achieve competitive results.

Cite

Text

Bartz et al. "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition." AAAI Conference on Artificial Intelligence, 2018. doi:10.1609/AAAI.V32I1.12242

Markdown

[Bartz et al. "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition." AAAI Conference on Artificial Intelligence, 2018.](https://mlanthology.org/aaai/2018/bartz2018aaai-see/) doi:10.1609/AAAI.V32I1.12242

BibTeX

@inproceedings{bartz2018aaai-see,
  title     = {{SEE: Towards Semi-Supervised End-to-End Scene Text Recognition}},
  author    = {Bartz, Christian and Yang, Haojin and Meinel, Christoph},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {6674-6681},
  doi       = {10.1609/AAAI.V32I1.12242},
  url       = {https://mlanthology.org/aaai/2018/bartz2018aaai-see/}
}