Gated Recurrent Convolution Neural Network for OCR

Abstract

Optical Character Recognition (OCR) aims to recognize text in natural images. Inspired by a recently proposed model for general image classification, Recurrent Convolution Neural Network (RCNN), we propose a new architecture named Gated RCNN (GRCNN) for solving this problem. Its critical component, Gated Recurrent Convolution Layer (GRCL), is constructed by adding a gate to the Recurrent Convolution Layer (RCL), the critical component of RCNN. The gate controls the context modulation in RCL and balances the feed-forward information and the recurrent information. In addition, an efficient Bidirectional Long Short-Term Memory (BLSTM) is built for sequence modeling. The GRCNN is combined with BLSTM to recognize text in natural images. The entire GRCNN-BLSTM model can be trained end-to-end. Experiments show that the proposed model outperforms existing methods on several benchmark datasets including the IIIT-5K, Street View Text (SVT) and ICDAR.

Cite

Text

Wang and Hu. "Gated Recurrent Convolution Neural Network for OCR." Neural Information Processing Systems, 2017.

Markdown

[Wang and Hu. "Gated Recurrent Convolution Neural Network for OCR." Neural Information Processing Systems, 2017.](https://mlanthology.org/neurips/2017/wang2017neurips-gated/)

BibTeX

@inproceedings{wang2017neurips-gated,
  title     = {{Gated Recurrent Convolution Neural Network for OCR}},
  author    = {Wang, Jianfeng and Hu, Xiaolin},
  booktitle = {Neural Information Processing Systems},
  year      = {2017},
  pages     = {335-344},
  url       = {https://mlanthology.org/neurips/2017/wang2017neurips-gated/}
}