Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks

Abstract

In this work, we jointly address the problem of text detection and recognition in natural scene images based on convolutional recurrent neural networks. We propose a unified network that simultaneously localizes and recognizes text with a single forward pass, avoiding intermediate processes, such as image cropping, feature re-calculation, word separation, and character grouping. In contrast to existing approaches that consider text detection and recognition as two distinct tasks and tackle them one by one, the proposed framework settles these two tasks concurrently. The whole framework can be trained end-to-end, requiring only images, ground-truth bounding boxes and text labels. The convolutional features are calculated only once and shared by both detection and recognition, which saves processing time. Through multi-task training, the learned features become more informative and improves the overall performance. Our proposed method has achieved competitive performance on several benchmark datasets.

Cite

Text

Li et al. "Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks." International Conference on Computer Vision, 2017. doi:10.1109/ICCV.2017.560

Markdown

[Li et al. "Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks." International Conference on Computer Vision, 2017.](https://mlanthology.org/iccv/2017/li2017iccv-endtoend/) doi:10.1109/ICCV.2017.560

BibTeX

@inproceedings{li2017iccv-endtoend,
  title     = {{Towards End-to-End Text Spotting with Convolutional Recurrent Neural Networks}},
  author    = {Li, Hui and Wang, Peng and Shen, Chunhua},
  booktitle = {International Conference on Computer Vision},
  year      = {2017},
  doi       = {10.1109/ICCV.2017.560},
  url       = {https://mlanthology.org/iccv/2017/li2017iccv-endtoend/}
}