Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images

Abstract

Digitization of scanned receipts aims to extract text from receipt images and save it into structured documents. This is usually split into two sub-tasks: text localization and optical character recognition (OCR). Most existing OCR models only focus on the cropped text instance images, which require the bounding box information provided by a text region detection model. Introducing an additional detector to identify the text instance images in advance adds complexity, however instance-level OCR models have very low accuracy when processing the whole image for the document-level OCR, such as receipt images containing multiple text lines arranged in various layouts. To this end, we propose a localization-free document-level OCR model for transcribing all the characters in a receipt image into an ordered sequence end-to-end. Specifically, we finetune the pretrained instance-level model TrOCR with randomly cropped image chunks, and gradually increase the image chunk size to generalize the recognition ability from instance images to full-page images. In our experiments on the SROIE receipt OCR dataset, the model finetuned with our strategy achieved 64.4 F1-score and a 22.8% character error rate (CER), respectively, which outperforms the baseline results with 48.5 F1-score and 50.6% CER. The best model, which splits the full image into 15 equally sized chunks, gives 87.8 F1-score and 4.98% CER with minimal additional pre or post-processing of the output. Moreover, the characters in the generated document-level sequences are arranged in the reading order, which is practical for real-world applications.

Cite

Text

Zhang et al. "Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00160

Markdown

[Zhang et al. "Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/zhang2023iccvw-extending/) doi:10.1109/ICCVW60793.2023.00160

BibTeX

@inproceedings{zhang2023iccvw-extending,
  title     = {{Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images}},
  author    = {Zhang, Hongkuan and Whittaker, Edward and Kitagishi, Ikuo},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2023},
  pages     = {1471-1477},
  doi       = {10.1109/ICCVW60793.2023.00160},
  url       = {https://mlanthology.org/iccvw/2023/zhang2023iccvw-extending/}
}