Towards Learning Spatially Discriminative Feature Representations

Abstract

The backbone of traditional CNN classifier is generally considered as a feature extractor, followed by a linear layer which performs the classification. We propose a novel loss function, termed as CAM-loss, to constrain the embedded feature maps with the class activation maps (CAMs) which indicate the spatially discriminative regions of an image for particular categories. CAM-loss drives the backbone to express the features of target category and suppress the features of non-target categories or background, so as to obtain more discriminative feature representations. It can be simply applied in any CNN architecture with neglectable additional parameters and calculations. Experimental results show that CAM-loss is applicable to a variety of network structures and can be combined with mainstream regularization methods to improve the performance of image classification. The strong generalization ability of CAM-loss is validated in the transfer learning and few shot learning tasks. Based on CAM-loss, we also propose a novel CAAM-CAM matching knowledge distillation method. This method directly uses the CAM generated by the teacher network to supervise the CAAM generated by the student network, which effectively improves the accuracy and convergence rate of the student network.

Cite

Text

Wang et al. "Towards Learning Spatially Discriminative Feature Representations." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00136

Markdown

[Wang et al. "Towards Learning Spatially Discriminative Feature Representations." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/wang2021iccv-learning-b/) doi:10.1109/ICCV48922.2021.00136

BibTeX

@inproceedings{wang2021iccv-learning-b,
  title     = {{Towards Learning Spatially Discriminative Feature Representations}},
  author    = {Wang, Chaofei and Xiao, Jiayu and Han, Yizeng and Yang, Qisen and Song, Shiji and Huang, Gao},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {1326-1335},
  doi       = {10.1109/ICCV48922.2021.00136},
  url       = {https://mlanthology.org/iccv/2021/wang2021iccv-learning-b/}
}