Chargrid-OCR: End-to-End Trainable Optical Character Recognition Through Semantic Segmentation and Object Detection

Abstract

We present an end-to-end trainable approach for optical character recognition (OCR) on printed documents. It is based on predicting a two-dimensional character grid ('chargrid') representation of a document image as a semantic segmentation task. To identify individual character instances from the chargrid, we regard characters as objects and use object detection techniques from computer vision. We demonstrate experimentally that our method outperforms previous state-of-the-art approaches in accuracy while being easily parallelizable on GPU (thereby being significantly faster), as well as easier to train.

Cite

Text

Reisswig et al. "Chargrid-OCR: End-to-End Trainable Optical Character Recognition Through Semantic Segmentation and Object Detection." NeurIPS 2019 Workshops: Document_Intelligence, 2019.

Markdown

[Reisswig et al. "Chargrid-OCR: End-to-End Trainable Optical Character Recognition Through Semantic Segmentation and Object Detection." NeurIPS 2019 Workshops: Document_Intelligence, 2019.](https://mlanthology.org/neuripsw/2019/reisswig2019neuripsw-chargridocr/)

BibTeX

@inproceedings{reisswig2019neuripsw-chargridocr,
  title     = {{Chargrid-OCR: End-to-End Trainable Optical Character Recognition Through Semantic Segmentation and Object Detection}},
  author    = {Reisswig, Christian and Katti, Anoop R and Spinaci, Marco and Höhne, Johannes},
  booktitle = {NeurIPS 2019 Workshops: Document_Intelligence},
  year      = {2019},
  url       = {https://mlanthology.org/neuripsw/2019/reisswig2019neuripsw-chargridocr/}
}