Platypus: A Generalized Specialist Model for Reading Text in Various Forms

Abstract

Reading text from images (either natural scenes or documents) has been a long-standing research topic for decades, due to the high technical challenge and wide application range. Previously, individual specialist models are developed to tackle the sub-tasks of text reading (e.g., scene text recognition, handwritten text recognition and mathematical expression recognition). However, such specialist models usually cannot effectively generalize across different sub-tasks. Recently, generalist models (such as GPT-4V), trained on tremendous data in a unified way, have shown enormous potential in reading text in various scenarios, but with the drawbacks of limited accuracy and low efficiency. In this work, we propose Platypus, a generalized specialist model for text reading. Specifically, Platypus combines the best of both worlds: being able recognize text of various forms with a single unified architecture, while achieving excellent accuracy and high efficiency. To better exploit the advantage of Platypus, we also construct a text reading dataset (called Worms), the images of which are curated from previous datasets and partially re-labeled. Experiments on standard benchmarks demonstrate the effectiveness and superiority of the proposed Platypus model. Model and data will be made publicly available at magentaAdvancedLiterateMachinery.

Cite

Text

Wang et al. "Platypus: A Generalized Specialist Model for Reading Text in Various Forms." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72761-0_10

Markdown

[Wang et al. "Platypus: A Generalized Specialist Model for Reading Text in Various Forms." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/wang2024eccv-platypus/) doi:10.1007/978-3-031-72761-0_10

BibTeX

@inproceedings{wang2024eccv-platypus,
  title     = {{Platypus: A Generalized Specialist Model for Reading Text in Various Forms}},
  author    = {Wang, Peng and Li, Zhaohai and Tang, Jun and Zhong, Humen and Huang, Fei and Yang, Zhibo and Yao, Cong},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72761-0_10},
  url       = {https://mlanthology.org/eccv/2024/wang2024eccv-platypus/}
}