Asymptotically Optimal Information-Directed Sampling

Abstract

We introduce a simple and efficient algorithm for stochastic linear bandits with finitely many actions that is asymptotically optimal and (nearly) worst-case optimal in finite time. The approach is based on the frequentist information-directed sampling (IDS) framework, with a surrogate for the information gain that is informed by the optimization problem that defines the asymptotic lower bound. Our analysis sheds light on how IDS balances the trade-off between regret and information and uncovers a surprising connection between the recently proposed primal-dual methods and the IDS algorithm. We demonstrate empirically that IDS is competitive with UCB in finite-time, and can be significantly better in the asymptotic regime.

Cite

Text

Kirschner et al. "Asymptotically Optimal Information-Directed Sampling." Conference on Learning Theory, 2021.

Markdown

[Kirschner et al. "Asymptotically Optimal Information-Directed Sampling." Conference on Learning Theory, 2021.](https://mlanthology.org/colt/2021/kirschner2021colt-asymptotically/)

BibTeX

@inproceedings{kirschner2021colt-asymptotically,
  title     = {{Asymptotically Optimal Information-Directed Sampling}},
  author    = {Kirschner, Johannes and Lattimore, Tor and Vernade, Claire and Szepesvari, Csaba},
  booktitle = {Conference on Learning Theory},
  year      = {2021},
  pages     = {2777-2821},
  volume    = {134},
  url       = {https://mlanthology.org/colt/2021/kirschner2021colt-asymptotically/}
}