Pixel-Wise Contrastive Distillation

Abstract

We present a simple but effective pixel-level self-supervised distillation framework friendly to dense prediction tasks. Our method, called Pixel-Wise Contrastive Distillation (PCD), distills knowledge by attracting the corresponding pixels from student's and teacher's output feature maps. PCD includes a novel design called SpatialAdaptor which "reshapes" a part of the teacher network while preserving the distribution of its output features. Our ablation experiments suggest that this reshaping behavior enables more informative pixel-to-pixel distillation. Moreover, we utilize a plug-in multi-head self-attention module that explicitly relates the pixels of student's feature maps to enhance the effective receptive field, leading to a more competitive student. PCD outperforms previous self-supervised distillation methods on various dense prediction tasks. A backbone of ResNet-18-FPN distilled by PCD achieves 37.4 AP-bbox and 34.0 AP-mask on COCO dataset using the detector of Mask R-CNN. We hope our study will inspire future research on how to pre-train a small model friendly to dense prediction tasks in a self-supervised fashion.

Cite

Text

Huang and Guo. "Pixel-Wise Contrastive Distillation." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01499

Markdown

[Huang and Guo. "Pixel-Wise Contrastive Distillation." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/huang2023iccv-pixelwise/) doi:10.1109/ICCV51070.2023.01499

BibTeX

@inproceedings{huang2023iccv-pixelwise,
  title     = {{Pixel-Wise Contrastive Distillation}},
  author    = {Huang, Junqiang and Guo, Zichao},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {16359-16369},
  doi       = {10.1109/ICCV51070.2023.01499},
  url       = {https://mlanthology.org/iccv/2023/huang2023iccv-pixelwise/}
}