Deep Convolutional Networks as Shallow Gaussian Processes

Abstract

We show that the output of a (residual) CNN with an appropriate prior over the weights and biases is a GP in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed efficiently; the cost of evaluating the kernel for a pair of images is similar to a single forward pass through the original CNN with only one filter per layer. The kernel equivalent to a 32-layer ResNet obtains 0.84% classification error on MNIST, a new record for GP with a comparable number of parameters.

Cite

Text

Garriga-Alonso et al. "Deep Convolutional Networks as Shallow Gaussian Processes." International Conference on Learning Representations, 2019.

Markdown

[Garriga-Alonso et al. "Deep Convolutional Networks as Shallow Gaussian Processes." International Conference on Learning Representations, 2019.](https://mlanthology.org/iclr/2019/garrigaalonso2019iclr-deep/)

BibTeX

@inproceedings{garrigaalonso2019iclr-deep,
  title     = {{Deep Convolutional Networks as Shallow Gaussian Processes}},
  author    = {Garriga-Alonso, Adrià and Rasmussen, Carl Edward and Aitchison, Laurence},
  booktitle = {International Conference on Learning Representations},
  year      = {2019},
  url       = {https://mlanthology.org/iclr/2019/garrigaalonso2019iclr-deep/}
}