Synthetic Data for Text Localisation in Natural Images

Abstract

In this paper we introduce a new method for text detection in natural images. The method comprises two contributions: First, a fast and scalable engine to generate synthetic images of text in clutter. This engine overlays synthetic text to existing background images in a natural way, accounting for the local 3D scene geometry. Second, we use the synthetic images to train a Fully-Convolutional Regression Network (FCRN) which efficiently performs text detection and bounding-box regression at all locations and multiple scales in an image. We discuss the relation of FCRN to the recently-introduced YOLO detector, as well as other end-to-end object detection systems based on deep learning. The resulting detection network significantly out performs current methods for text detection in natural images, achieving an F-measure of 84.2% on the standard ICDAR 2013 benchmark. Furthermore, it can process 15 images per second on a GPU.

Cite

Text

Gupta et al. "Synthetic Data for Text Localisation in Natural Images." Conference on Computer Vision and Pattern Recognition, 2016. doi:10.1109/CVPR.2016.254

Markdown

[Gupta et al. "Synthetic Data for Text Localisation in Natural Images." Conference on Computer Vision and Pattern Recognition, 2016.](https://mlanthology.org/cvpr/2016/gupta2016cvpr-synthetic/) doi:10.1109/CVPR.2016.254

BibTeX

@inproceedings{gupta2016cvpr-synthetic,
  title     = {{Synthetic Data for Text Localisation in Natural Images}},
  author    = {Gupta, Ankush and Vedaldi, Andrea and Zisserman, Andrew},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2016},
  doi       = {10.1109/CVPR.2016.254},
  url       = {https://mlanthology.org/cvpr/2016/gupta2016cvpr-synthetic/}
}