Kernel Pooling for Convolutional Neural Networks

Abstract

Convolutional Neural Networks (CNNs) with Bilinear Pooling, initially in their full form and later using compact representations, have yielded impressive performance gains on a wide range of visual tasks, including fine-grained visual categorization, visual question answering, face recognition, and description of texture and style. The key to their success lies in the spatially invariant modeling of pairwise (2nd order) feature interactions. In this work, we propose a general pooling framework that captures higher order interactions of features in the form of kernels. We demonstrate how to approximate kernels such as Gaussian RBF up to a given order using compact explicit feature maps in a parameter-free manner. Combined with CNNs, the composition of the kernel can be learned from data in an end-to-end fashion via error back-propagation. The proposed kernel pooling scheme is evaluated in terms of both kernel approximation error and visual recognition accuracy. Experimental evaluations demonstrate state-of-the-art performance on commonly used fine-grained recognition datasets.

Cite

Text

Cui et al. "Kernel Pooling for Convolutional Neural Networks." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.325

Markdown

[Cui et al. "Kernel Pooling for Convolutional Neural Networks." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/cui2017cvpr-kernel/) doi:10.1109/CVPR.2017.325

BibTeX

@inproceedings{cui2017cvpr-kernel,
  title     = {{Kernel Pooling for Convolutional Neural Networks}},
  author    = {Cui, Yin and Zhou, Feng and Wang, Jiang and Liu, Xiao and Lin, Yuanqing and Belongie, Serge},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2017},
  doi       = {10.1109/CVPR.2017.325},
  url       = {https://mlanthology.org/cvpr/2017/cui2017cvpr-kernel/}
}