Text Spotting Transformers

Abstract

In this paper, we present TExt Spotting TRansformers (TESTR), a generic end-to-end text spotting framework using Transformers for text detection and recognition in the wild. TESTR builds upon a single encoder and dual decoders for the joint text-box control point regression and character recognition. Other than most existing literature, our method is free from Region-of-Interest operations and heuristics-driven post-processing procedures; TESTR is particularly effective when dealing with curved text-boxes where special cares are needed for the adaptation of the traditional bounding-box representations. We show our canonical representation of control points suitable for text instances in both Bezier curve and polygon annotations. In addition, we design a bounding-box guided polygon detection (box-to-polygon) process. Experiments on curved and arbitrarily shaped datasets demonstrate state-of-the-art performances of the proposed TESTR algorithm.

Cite

Text

Zhang et al. "Text Spotting Transformers." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00930

Markdown

[Zhang et al. "Text Spotting Transformers." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/zhang2022cvpr-text/) doi:10.1109/CVPR52688.2022.00930

BibTeX

@inproceedings{zhang2022cvpr-text,
  title     = {{Text Spotting Transformers}},
  author    = {Zhang, Xiang and Su, Yongwen and Tripathi, Subarna and Tu, Zhuowen},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {9519-9528},
  doi       = {10.1109/CVPR52688.2022.00930},
  url       = {https://mlanthology.org/cvpr/2022/zhang2022cvpr-text/}
}