Learning Visual Shape Lexicon for Document Image Content Recognition
Abstract
Developing effective content recognition methods for diverse imagery continues to challenge computer vision researchers. We present a new approach for document image content categorization using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant shape feature that is generic enough to be detected repeatably and segmentation free. We learn a concise, structurally indexed shape lexicon from training by clustering and partitioning feature types through graph cuts. We demonstrate our approach on two challenging document image content recognition problems: 1) The classification of 4,500 Web images crawled from Google Image Search into three content categories — pure image, image with text, and document image, and 2) Language identification of 8 languages (Arabic, Chinese, English, Hindi, Japanese, Korean, Russian, and Thai) on a 1,512 complex document image database composed of mixed machine printed text and handwriting. Our approach is capable to handle high intra-class variability and shows results that exceed other state-of-the-art approaches, allowing it to be used as a content recognizer in image indexing and retrieval systems.
Cite
Text
Zhu et al. "Learning Visual Shape Lexicon for Document Image Content Recognition." European Conference on Computer Vision, 2008. doi:10.1007/978-3-540-88688-4_55Markdown
[Zhu et al. "Learning Visual Shape Lexicon for Document Image Content Recognition." European Conference on Computer Vision, 2008.](https://mlanthology.org/eccv/2008/zhu2008eccv-learning/) doi:10.1007/978-3-540-88688-4_55BibTeX
@inproceedings{zhu2008eccv-learning,
title = {{Learning Visual Shape Lexicon for Document Image Content Recognition}},
author = {Zhu, Guangyu and Yu, Xiaodong and Li, Yi and Doermann, David S.},
booktitle = {European Conference on Computer Vision},
year = {2008},
pages = {745-758},
doi = {10.1007/978-3-540-88688-4_55},
url = {https://mlanthology.org/eccv/2008/zhu2008eccv-learning/}
}