Rates of Convergence for Sparse Variational Gaussian Process Regression
Abstract
Excellent variational approximations to Gaussian process posteriors have been developed which avoid the $\mathcal{O}\left(N^3\right)$ scaling with dataset size $N$. They reduce the computational cost to $\mathcal{O}\left(NM^2\right)$, with $M\ll N$ the number of inducing variables, which summarise the process. While the computational cost seems to be linear in $N$, the true complexity of the algorithm depends on how $M$ must increase to ensure a certain quality of approximation. We show that with high probability the KL divergence can be made arbitrarily small by growing $M$ more slowly than $N$. A particular case is that for regression with normally distributed inputs in D-dimensions with the Squared Exponential kernel, $M=\mathcal{O}(\log^D N)$ suffices. Our results show that as datasets grow, Gaussian process posteriors can be approximated cheaply, and provide a concrete rule for how to increase $M$ in continual learning scenarios.
Cite
Text
Burt et al. "Rates of Convergence for Sparse Variational Gaussian Process Regression." International Conference on Machine Learning, 2019.Markdown
[Burt et al. "Rates of Convergence for Sparse Variational Gaussian Process Regression." International Conference on Machine Learning, 2019.](https://mlanthology.org/icml/2019/burt2019icml-rates/)BibTeX
@inproceedings{burt2019icml-rates,
title = {{Rates of Convergence for Sparse Variational Gaussian Process Regression}},
author = {Burt, David and Rasmussen, Carl Edward and Van Der Wilk, Mark},
booktitle = {International Conference on Machine Learning},
year = {2019},
pages = {862-871},
volume = {97},
url = {https://mlanthology.org/icml/2019/burt2019icml-rates/}
}