Asymptotically Optimal Information-Directed Sampling
Abstract
We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information and uncovers a surprising connection between the recently proposed primal-dual methods and the IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.
Cite
Text
Kirschner et al. "Asymptotically Optimal Information-Directed Sampling." Conference on Learning Theory, 2021.Markdown
[Kirschner et al. "Asymptotically Optimal Information-Directed Sampling." Conference on Learning Theory, 2021.](https://mlanthology.org/colt/2021/kirschner2021colt-asymptotically/)BibTeX
@inproceedings{kirschner2021colt-asymptotically,
title = {{Asymptotically Optimal Information-Directed Sampling}},
author = {Kirschner, Johannes and Lattimore, Tor and Vernade, Claire and Szepesvari, Csaba},
booktitle = {Conference on Learning Theory},
year = {2021},
pages = {2777-2821},
volume = {134},
url = {https://mlanthology.org/colt/2021/kirschner2021colt-asymptotically/}
}