Randomized Exploration in Generalized Linear Bandits

Abstract

We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL, samples a generalized linear model (GLM) from the Laplace approximation to the posterior distribution. The second, GLM-FPL, fits a GLM to a randomly perturbed history of past rewards. We analyze both algorithms and derive $\tilde{O}(d \sqrt{n \log K})$ upper bounds on their $n$-round regret, where $d$ is the number of features and $K$ is the number of arms. The former improves on prior work while the latter is the first for Gaussian noise perturbations in non-linear models. We empirically evaluate both GLM-TSL and GLM-FPL in logistic bandits, and apply GLM-FPL to neural network bandits. Our work showcases the role of randomization, beyond posterior sampling, in exploration.

Cite

Text

Kveton et al. "Randomized Exploration in Generalized Linear Bandits." Artificial Intelligence and Statistics, 2020.

Markdown

[Kveton et al. "Randomized Exploration in Generalized Linear Bandits." Artificial Intelligence and Statistics, 2020.](https://mlanthology.org/aistats/2020/kveton2020aistats-randomized/)

BibTeX

@inproceedings{kveton2020aistats-randomized,
  title     = {{Randomized Exploration in Generalized Linear Bandits}},
  author    = {Kveton, Branislav and Zaheer, Manzil and Szepesvari, Csaba and Li, Lihong and Ghavamzadeh, Mohammad and Boutilier, Craig},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2020},
  pages     = {2066-2076},
  volume    = {108},
  url       = {https://mlanthology.org/aistats/2020/kveton2020aistats-randomized/}
}