Knowledge Mining with Scene Text for Fine-Grained Recognition

Abstract

Recently, the semantics of scene text has been proven to be essential in fine-grained image classification. However, the existing methods mainly exploit the literal meaning of scene text for fine-grained recognition, which might be irrelevant when it is not significantly related to objects/scenes. We propose an end-to-end trainable network that mines implicit contextual knowledge behind scene text image and enhance the semantics and correlation to fine-tune the image representation. Unlike the existing methods, our model integrates three modalities: visual feature extraction, text semantics extraction, and correlating background knowledge to fine-grained image classification. Specifically, we employ KnowBert to retrieve relevant knowledge for semantic representation and combine it with image features for fine-grained classification. Experiments on two benchmark datasets, Con-Text, and Drink Bottle, show that our method outperforms the state-of-the-art by 3.72% mAP and 5.39% mAP, respectively. To further validate the effectiveness of the proposed method, we create a new dataset on crowd activity recognition for the evaluation. The source code, new dataset, and pre-trained models of this work will be publicly available.

Cite

Text

Wang et al. "Knowledge Mining with Scene Text for Fine-Grained Recognition." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.00458

Markdown

[Wang et al. "Knowledge Mining with Scene Text for Fine-Grained Recognition." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/wang2022cvpr-knowledge/) doi:10.1109/CVPR52688.2022.00458

BibTeX

@inproceedings{wang2022cvpr-knowledge,
  title     = {{Knowledge Mining with Scene Text for Fine-Grained Recognition}},
  author    = {Wang, Hao and Liao, Junchao and Cheng, Tianheng and Gao, Zewen and Liu, Hao and Ren, Bo and Bai, Xiang and Liu, Wenyu},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2022},
  pages     = {4624-4633},
  doi       = {10.1109/CVPR52688.2022.00458},
  url       = {https://mlanthology.org/cvpr/2022/wang2022cvpr-knowledge/}
}