Patch-Level Contrastive Learning via Positional Query for Visual Pre-Training
Abstract
Dense contrastive learning (DCL) has been recently explored for learning localized information for dense prediction tasks (e.g., detection and segmentation). It still suffers the difficulty of mining pixels/patches correspondence between two views. A simple way is inputting the same view twice and aligning the pixel/patch representation. However, it would reduce the variance of inputs, and hurts the performance. We propose a plug-in method PQCL (Positional Query for patch-level Contrastive Learning), which allows performing patch-level contrasts between two views with exact patch correspondence. Besides, by using positional queries, PQCL increases the variance of inputs, to enhance training. We apply PQCL to popular transformer-based CL frameworks (DINO and iBOT, and evaluate them on classification, detection and segmentation tasks, where our method obtains stable improvements, especially for dense tasks. It achieves new state-of-the-art in most settings. Code is available at https://github.com/Sherrylone/Query_Contrastive.
Cite
Text
Zhang et al. "Patch-Level Contrastive Learning via Positional Query for Visual Pre-Training." International Conference on Machine Learning, 2023.Markdown
[Zhang et al. "Patch-Level Contrastive Learning via Positional Query for Visual Pre-Training." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/zhang2023icml-patchlevel/)BibTeX
@inproceedings{zhang2023icml-patchlevel,
title = {{Patch-Level Contrastive Learning via Positional Query for Visual Pre-Training}},
author = {Zhang, Shaofeng and Zhou, Qiang and Wang, Zhibin and Wang, Fan and Yan, Junchi},
booktitle = {International Conference on Machine Learning},
year = {2023},
pages = {41990-41999},
volume = {202},
url = {https://mlanthology.org/icml/2023/zhang2023icml-patchlevel/}
}