Deep Features for Text Spotting

Abstract

The goal of this work is text spotting in natural images. This is divided into two sequential tasks: detecting words regions in the image, and recognizing the words within these regions. We make the following contributions: first, we develop a Convolutional Neural Network (CNN) classifier that can be used for both tasks. The CNN has a novel architecture that enables efficient feature sharing (by using a number of layers in common) for text detection, character case-sensitive and insensitive classification, and bigram classification. It exceeds the state-of-the-art performance for all of these. Second, we make a number of technical changes over the traditional CNN architectures, including no downsampling for a per-pixel sliding window, and multi-mode learning with a mixture of linear models (maxout). Third, we have a method of automated data mining of Flickr, that generates word and character level annotations. Finally, these components are used together to form an end-to-end, state-of-the-art text spotting system. We evaluate the text-spotting system on two standard benchmarks, the ICDAR Robust Reading data set and the Street View Text data set, and demonstrate improvements over the state-of-the-art on multiple measures.

Cite

Text

Jaderberg et al. "Deep Features for Text Spotting." European Conference on Computer Vision, 2014. doi:10.1007/978-3-319-10593-2_34

Markdown

[Jaderberg et al. "Deep Features for Text Spotting." European Conference on Computer Vision, 2014.](https://mlanthology.org/eccv/2014/jaderberg2014eccv-deep/) doi:10.1007/978-3-319-10593-2_34

BibTeX

@inproceedings{jaderberg2014eccv-deep,
  title     = {{Deep Features for Text Spotting}},
  author    = {Jaderberg, Max and Vedaldi, Andrea and Zisserman, Andrew},
  booktitle = {European Conference on Computer Vision},
  year      = {2014},
  pages     = {512-528},
  doi       = {10.1007/978-3-319-10593-2_34},
  url       = {https://mlanthology.org/eccv/2014/jaderberg2014eccv-deep/}
}