Learning Token-Based Representation for Image Retrieval

Hui Wu, Min Wang, Wengang Zhou, Yang Hu, Houqiang Li

AAAI 2022 pp. 2703-2711

doi:10.1609/AAAI.V36I3.20173 /aaai/2022/wu2022aaai-learning/

Abstract

In image retrieval, deep local features learned in a data-driven manner have been demonstrated effective to improve retrieval performance. To realize efficient retrieval on large image database, some approaches quantize deep local features with a large codebook and match images with aggregated match kernel. However, the complexity of these approaches is non-trivial with large memory footprint, which limits their capability to jointly perform feature learning and aggregation. To generate compact global representations while maintaining regional matching capability, we propose a unified framework to jointly learn local feature representation and aggregation. In our framework, we first extract local features using CNNs. Then, we design a tokenizer module to aggregate them into a few visual tokens, each corresponding to a specific visual pattern. This helps to remove background noise, and capture more discriminative regions in the image. Next, a refinement block is introduced to enhance the visual tokens with self-attention and cross-attention. Finally, different visual tokens are concatenated to generate a compact global representation. The whole framework is trained end-to-end with image-level labels. Extensive experiments are conducted to evaluate our approach, which outperforms the state-of-the-art methods on the Revisited Oxford and Paris datasets.

PDF AAAI Semantic Scholar

Cite

Text

Wu et al. "Learning Token-Based Representation for Image Retrieval." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I3.20173

Markdown

[Wu et al. "Learning Token-Based Representation for Image Retrieval." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/wu2022aaai-learning/) doi:10.1609/AAAI.V36I3.20173

BibTeX

@inproceedings{wu2022aaai-learning,
  title     = {{Learning Token-Based Representation for Image Retrieval}},
  author    = {Wu, Hui and Wang, Min and Zhou, Wengang and Hu, Yang and Li, Houqiang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {2703-2711},
  doi       = {10.1609/AAAI.V36I3.20173},
  url       = {https://mlanthology.org/aaai/2022/wu2022aaai-learning/}
}