Meta-Learning with Stochastic Linear Bandits

Abstract

We investigate meta-learning procedures in the setting of stochastic linear bandits tasks. The goal is to select a learning algorithm which works well on average over a class of bandits tasks, that are sampled from a task-distribution. Inspired by recent work on learning-to-learn linear regression, we consider a class of bandit algorithms that implement a regularized version of the well-known OFUL algorithm, where the regularization is a square euclidean distance to a bias vector. We first study the benefit of the biased OFUL algorithm in terms of regret minimization. We then propose two strategies to estimate the bias within the learning-to-learn setting. We show both theoretically and experimentally, that when the number of tasks grows and the variance of the task-distribution is small, our strategies have a significant advantage over learning the tasks in isolation.

Cite

Text

Cella et al. "Meta-Learning with Stochastic Linear Bandits." International Conference on Machine Learning, 2020.

Markdown

[Cella et al. "Meta-Learning with Stochastic Linear Bandits." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/cella2020icml-metalearning/)

BibTeX

@inproceedings{cella2020icml-metalearning,
  title     = {{Meta-Learning with Stochastic Linear Bandits}},
  author    = {Cella, Leonardo and Lazaric, Alessandro and Pontil, Massimiliano},
  booktitle = {International Conference on Machine Learning},
  year      = {2020},
  pages     = {1360-1370},
  volume    = {119},
  url       = {https://mlanthology.org/icml/2020/cella2020icml-metalearning/}
}