Sequence-to-Sequence Contrastive Learning for Text Recognition

Aberdam, Aviad; Litman, Ron; Tsiper, Shahar; Anschel, Oron; Slossberg, Ron; Mazor, Shai; Manmatha, R.; Perona, Pietro

doi:10.1109/CVPR46437.2021.01505

Sequence-to-Sequence Contrastive Learning for Text Recognition

Aviad Aberdam, Ron Litman, Shahar Tsiper, Oron Anschel, Ron Slossberg, Shai Mazor, R. Manmatha, Pietro Perona

CVPR 2021 pp. 15302-15312

doi:10.1109/CVPR46437.2021.01505 /cvpr/2021/aberdam2021cvpr-sequencetosequence/

Abstract

We propose a framework for sequence-to-sequence contrastive learning (SeqCLR) of visual representations, which we apply to text recognition. To account for the sequence-to-sequence structure, each feature map is divided into different instances over which the contrastive loss is computed. This operation enables us to contrast in a sub-word level, where from each image we extract several positive pairs and multiple negative examples. To yield effective visual representations for text recognition, we further suggest novel augmentation heuristics, different encoder architectures and custom projection heads. Experiments on handwritten text and on scene text show that when a text decoder is trained on the learned representations, our method outperforms non-sequential contrastive methods. In addition, when the amount of supervision is reduced, SeqCLR significantly improves performance compared with supervised training, and when fine-tuned with 100% of the labels, our method achieves state-of-the-art results on standard handwritten text recognition benchmarks.

PDF CVPR Semantic Scholar

Cite

Text

Aberdam et al. "Sequence-to-Sequence Contrastive Learning for Text Recognition." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.01505

Markdown

[Aberdam et al. "Sequence-to-Sequence Contrastive Learning for Text Recognition." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/aberdam2021cvpr-sequencetosequence/) doi:10.1109/CVPR46437.2021.01505

BibTeX

@inproceedings{aberdam2021cvpr-sequencetosequence,
  title     = {{Sequence-to-Sequence Contrastive Learning for Text Recognition}},
  author    = {Aberdam, Aviad and Litman, Ron and Tsiper, Shahar and Anschel, Oron and Slossberg, Ron and Mazor, Shai and Manmatha, R. and Perona, Pietro},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {15302-15312},
  doi       = {10.1109/CVPR46437.2021.01505},
  url       = {https://mlanthology.org/cvpr/2021/aberdam2021cvpr-sequencetosequence/}
}