Learning Curves for Multi-Task Gaussian Process Regression

Abstract

We study the average case performance of multi-task Gaussian process (GP) regression as captured in the learning curve, i.e.\ the average Bayes error for a chosen task versus the total number of examples $n$ for all tasks. For GP covariances that are the product of an input-dependent covariance function and a free-form inter-task covariance matrix, we show that accurate approximations for the learning curve can be obtained for an arbitrary number of tasks $T$. We use these to study the asymptotic learning behaviour for large $n$. Surprisingly, multi-task learning can be asymptotically essentially useless: examples from other tasks only help when the degree of inter-task correlation, $\rho$, is near its maximal value $\rho=1$. This effect is most extreme for learning of smooth target functions as described by e.g.\ squared exponential kernels. We also demonstrate that when learning {\em many} tasks, the learning curves separate into an initial phase, where the Bayes error on each task is reduced down to a plateau value by ``collective learning'' even though most tasks have not seen examples, and a final decay that occurs only once the number of examples is proportional to the number of tasks.

Cite

Text

Sollich and Ashton. "Learning Curves for Multi-Task Gaussian Process Regression." Neural Information Processing Systems, 2012.

Markdown

[Sollich and Ashton. "Learning Curves for Multi-Task Gaussian Process Regression." Neural Information Processing Systems, 2012.](https://mlanthology.org/neurips/2012/sollich2012neurips-learning/)

BibTeX

@inproceedings{sollich2012neurips-learning,
  title     = {{Learning Curves for Multi-Task Gaussian Process Regression}},
  author    = {Sollich, Peter and Ashton, Simon},
  booktitle = {Neural Information Processing Systems},
  year      = {2012},
  pages     = {1781-1789},
  url       = {https://mlanthology.org/neurips/2012/sollich2012neurips-learning/}
}