General Detection-Based Text Line Recognition

Abstract

We introduce a general detection-based approach to text line recognition, be it printed (OCR) or handwritten text (HTR), with latin, chinese or ciphered characters. Detection-based approaches have until now largely been discarded for HTR because reading characters separately is often challenging, and character-level annotation is difficult and expensive. We overcome these challenges thanks to three main insights: (i) synthetic pre-training with diverse enough data to learn reasonable character localization in any script; (ii) modern transformer-based detectors can jointly detect a large number of instances and, if trained with an adequate masking strategy, leverage consistency between the different detections; (iii) once a pre-trained detection model with approximate character localization is available, it is possible to fine-tune it with line-level annotation on real data, even with a different alphabet. Our approach thus builds on a completely different paradigm than most state-of-the-art methods, which rely on autoregressive decoding, predicting character values one by one, while we treat a complete line in parallel. Remarkably, our method demonstrates good performance on range of scripts, usually tackled with specialized approaches: latin script, chinese script, and ciphers, for which we significantly improve state-of-the-art performances. Our code and models are available at https://github.com/raphael-baena/DTLR.

Cite

Text

Baena et al. "General Detection-Based Text Line Recognition." Neural Information Processing Systems, 2024. doi:10.52202/079017-1342

Markdown

[Baena et al. "General Detection-Based Text Line Recognition." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/baena2024neurips-general/) doi:10.52202/079017-1342

BibTeX

@inproceedings{baena2024neurips-general,
  title     = {{General Detection-Based Text Line Recognition}},
  author    = {Baena, Raphael and Kalleli, Syrine and Aubry, Mathieu},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1342},
  url       = {https://mlanthology.org/neurips/2024/baena2024neurips-general/}
}