Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images
Abstract
Digitization of scanned receipts aims to extract text from receipt images and save it into structured documents. This is usually split into two sub-tasks: text localization and optical character recognition (OCR). Most existing OCR models only focus on the cropped text instance images, which require the bounding box information provided by a text region detection model. Introducing an additional detector to identify the text instance images in advance adds complexity, however instance-level OCR models have very low accuracy when processing the whole image for the document-level OCR, such as receipt images containing multiple text lines arranged in various layouts. To this end, we propose a localization-free document-level OCR model for transcribing all the characters in a receipt image into an ordered sequence end-to-end. Specifically, we finetune the pretrained instance-level model TrOCR with randomly cropped image chunks, and gradually increase the image chunk size to generalize the recognition ability from instance images to full-page images. In our experiments on the SROIE receipt OCR dataset, the model finetuned with our strategy achieved 64.4 F1-score and a 22.8% character error rate (CER), respectively, which outperforms the baseline results with 48.5 F1-score and 50.6% CER. The best model, which splits the full image into 15 equally sized chunks, gives 87.8 F1-score and 4.98% CER with minimal additional pre or post-processing of the output. Moreover, the characters in the generated document-level sequences are arranged in the reading order, which is practical for real-world applications.
Cite
Text
Zhang et al. "Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00160Markdown
[Zhang et al. "Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/zhang2023iccvw-extending/) doi:10.1109/ICCVW60793.2023.00160BibTeX
@inproceedings{zhang2023iccvw-extending,
title = {{Extending TrOCR for Text Localization-Free OCR of Full-Page Scanned Receipt Images}},
author = {Zhang, Hongkuan and Whittaker, Edward and Kitagishi, Ikuo},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2023},
pages = {1471-1477},
doi = {10.1109/ICCVW60793.2023.00160},
url = {https://mlanthology.org/iccvw/2023/zhang2023iccvw-extending/}
}