Patch-Level Contrastive Learning via Positional Query for Visual Pre-Training

Abstract

Dense contrastive learning (DCL) has been recently explored for learning localized information for dense prediction tasks (e.g., detection and segmentation). It still suffers the difficulty of mining pixels/patches correspondence between two views. A simple way is inputting the same view twice and aligning the pixel/patch representation. However, it would reduce the variance of inputs, and hurts the performance. We propose a plug-in method PQCL (Positional Query for patch-level Contrastive Learning), which allows performing patch-level contrasts between two views with exact patch correspondence. Besides, by using positional queries, PQCL increases the variance of inputs, to enhance training. We apply PQCL to popular transformer-based CL frameworks (DINO and iBOT, and evaluate them on classification, detection and segmentation tasks, where our method obtains stable improvements, especially for dense tasks. It achieves new state-of-the-art in most settings. Code is available at https://github.com/Sherrylone/Query_Contrastive.

Cite

Text

Zhang et al. "Patch-Level Contrastive Learning via Positional Query for Visual Pre-Training." International Conference on Machine Learning, 2023.

Markdown

[Zhang et al. "Patch-Level Contrastive Learning via Positional Query for Visual Pre-Training." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/zhang2023icml-patchlevel/)

BibTeX

@inproceedings{zhang2023icml-patchlevel,
  title     = {{Patch-Level Contrastive Learning via Positional Query for Visual Pre-Training}},
  author    = {Zhang, Shaofeng and Zhou, Qiang and Wang, Zhibin and Wang, Fan and Yan, Junchi},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {41990-41999},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/zhang2023icml-patchlevel/}
}