Tools for Developing OCRs for Indian Scripts

Abstract

Development of OCRs for Indian script is an active area of research today. Indian scripts present great challenges to an OCR designer due to the large number of letters in the alphabet, the sophisticated ways in which they combine, and the complicated graphemes they result in. The problem is compounded by the unstructured manner in which popular fonts are designed. There is a lot of common structure in the different Indian scripts. In this paper, we argue that a number of automatic and semi-automatic tools can ease the development of recognizers for new font styles and new scripts. We discuss briefly three such tools we developed and show how they have helped build new OCRs. An integrated approach to the design of OCRs for all Indian scripts has great benefits. We are building OCRs for many Indian languages following this approach as part of a system to provide tools to create content in them.

Cite

Text

Kumar et al. "Tools for Developing OCRs for Indian Scripts." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2003. doi:10.1109/CVPRW.2003.10023

Markdown

[Kumar et al. "Tools for Developing OCRs for Indian Scripts." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2003.](https://mlanthology.org/cvprw/2003/kumar2003cvprw-tools/) doi:10.1109/CVPRW.2003.10023

BibTeX

@inproceedings{kumar2003cvprw-tools,
  title     = {{Tools for Developing OCRs for Indian Scripts}},
  author    = {Kumar, M. N. S. S. K. Pavan and Kiran, S. S. Ravi and Nayani, Abhishek and Jawahar, C. V. and Narayanan, P. J.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2003},
  pages     = {33},
  doi       = {10.1109/CVPRW.2003.10023},
  url       = {https://mlanthology.org/cvprw/2003/kumar2003cvprw-tools/}
}