Learning Deep Features for Discriminative Localization

Abstract

In this work, we revisit the global average pooling layer proposed in [13], and shed light on how it explicitly enables the convolutional neural network (CNN) to have remarkable localization ability despite being trained on image-level labels. While this technique was previously proposed as a means for regularizing training, we find that it actually builds a generic localizable deep representation that exposes the implicit attention of CNNs on image. Despite the apparent simplicity of global average pooling, we are able to achieve 37.1% top-5 error for object localization on ILSVRC 2014 without training on any bounding box annotation. We demonstrate that our network is able to localize the discriminative image regions on a variety of tasks despite not being trained for them.

Cite

Text

Zhou et al. "Learning Deep Features for Discriminative Localization." Conference on Computer Vision and Pattern Recognition, 2016. doi:10.1109/CVPR.2016.319

Markdown

[Zhou et al. "Learning Deep Features for Discriminative Localization." Conference on Computer Vision and Pattern Recognition, 2016.](https://mlanthology.org/cvpr/2016/zhou2016cvpr-learning-a/) doi:10.1109/CVPR.2016.319

BibTeX

@inproceedings{zhou2016cvpr-learning-a,
  title     = {{Learning Deep Features for Discriminative Localization}},
  author    = {Zhou, Bolei and Khosla, Aditya and Lapedriza, Agata and Oliva, Aude and Torralba, Antonio},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2016},
  doi       = {10.1109/CVPR.2016.319},
  url       = {https://mlanthology.org/cvpr/2016/zhou2016cvpr-learning-a/}
}