Reading Scene Text in Deep Convolutional Sequences

Abstract

We develop a Deep-Text Recurrent Network (DTRN)that regards scene text reading as a sequence labelling problem. We leverage recent advances of deep convolutional neural networks to generate an ordered highlevel sequence from a whole word image, avoiding the difficult character segmentation problem. Then a deep recurrent model, building on long short-term memory (LSTM), is developed to robustly recognize the generated CNN sequences, departing from most existing approaches recognising each character independently. Our model has a number of appealing properties in comparison to existing scene text recognition methods: (i) It can recognise highly ambiguous words by leveraging meaningful context information, allowing it to work reliably without either pre- or post-processing; (ii) the deep CNN feature is robust to various image distortions; (iii) it retains the explicit order information in word image, which is essential to discriminate word strings; (iv) the model does not depend on pre-defined dictionary, and it can process unknown words and arbitrary strings. It achieves impressive results on several benchmarks, advancing the-state-of-the-art substantially.

Cite

Text

He et al. "Reading Scene Text in Deep Convolutional Sequences." AAAI Conference on Artificial Intelligence, 2016. doi:10.1609/AAAI.V30I1.10465

Markdown

[He et al. "Reading Scene Text in Deep Convolutional Sequences." AAAI Conference on Artificial Intelligence, 2016.](https://mlanthology.org/aaai/2016/he2016aaai-reading/) doi:10.1609/AAAI.V30I1.10465

BibTeX

@inproceedings{he2016aaai-reading,
  title     = {{Reading Scene Text in Deep Convolutional Sequences}},
  author    = {He, Pan and Huang, Weilin and Qiao, Yu and Loy, Chen Change and Tang, Xiaoou},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2016},
  pages     = {3501-3508},
  doi       = {10.1609/AAAI.V30I1.10465},
  url       = {https://mlanthology.org/aaai/2016/he2016aaai-reading/}
}