General Detection-Based Text Line Recognition
Abstract
We introduce a general detection-based approach to text line recognition, be it printed (OCR) or handwritten text (HTR), with latin, chinese or ciphered characters. Detection-based approaches have until now largely been discarded for HTR because reading characters separately is often challenging, and character-level annotation is difficult and expensive. We overcome these challenges thanks to three main insights: (i) synthetic pre-training with diverse enough data to learn reasonable character localization in any script; (ii) modern transformer-based detectors can jointly detect a large number of instances and, if trained with an adequate masking strategy, leverage consistency between the different detections; (iii) once a pre-trained detection model with approximate character localization is available, it is possible to fine-tune it with line-level annotation on real data, even with a different alphabet. Our approach thus builds on a completely different paradigm than most state-of-the-art methods, which rely on autoregressive decoding, predicting character values one by one, while we treat a complete line in parallel. Remarkably, our method demonstrates good performance on range of scripts, usually tackled with specialized approaches: latin script, chinese script, and ciphers, for which we significantly improve state-of-the-art performances. Our code and models are available at https://github.com/raphael-baena/DTLR.
Cite
Text
Baena et al. "General Detection-Based Text Line Recognition." Neural Information Processing Systems, 2024. doi:10.52202/079017-1342Markdown
[Baena et al. "General Detection-Based Text Line Recognition." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/baena2024neurips-general/) doi:10.52202/079017-1342BibTeX
@inproceedings{baena2024neurips-general,
title = {{General Detection-Based Text Line Recognition}},
author = {Baena, Raphael and Kalleli, Syrine and Aubry, Mathieu},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-1342},
url = {https://mlanthology.org/neurips/2024/baena2024neurips-general/}
}