Deep Kernel Learning

Abstract

We introduce scalable deep kernels, which combine the structural properties of deep learning architectures with the non-parametric flexibility of kernel methods. Specifically, we transform the inputs of a spectral mixture base kernel with a deep architecture, using local kernel interpolation, inducing points, and structure exploiting (Kronecker and Toeplitz) algebra for a scalable kernel representation. These closed-form kernels can be used as drop-in replacements for standard kernels, with benefits in expressive power and scalability. We jointly learn the properties of these kernels through the marginal likelihood of a Gaussian process. Inference and learning cost $O(n)$ for $n$ training points, and predictions cost $O(1)$ per test point. On a large and diverse collection of applications, including a dataset with 2 million examples, we show improved performance over scalable Gaussian processes with flexible kernel learning models, and stand-alone deep architectures.

Cite

Text

Wilson et al. "Deep Kernel Learning." International Conference on Artificial Intelligence and Statistics, 2016.

Markdown

[Wilson et al. "Deep Kernel Learning." International Conference on Artificial Intelligence and Statistics, 2016.](https://mlanthology.org/aistats/2016/wilson2016aistats-deep/)

BibTeX

@inproceedings{wilson2016aistats-deep,
  title     = {{Deep Kernel Learning}},
  author    = {Wilson, Andrew Gordon and Hu, Zhiting and Salakhutdinov, Ruslan and Xing, Eric P.},
  booktitle = {International Conference on Artificial Intelligence and Statistics},
  year      = {2016},
  pages     = {370-378},
  url       = {https://mlanthology.org/aistats/2016/wilson2016aistats-deep/}
}