Adaptation to the Range in K-Armed Bandits

Abstract

We consider stochastic bandit problems with $K$ arms, each associated with a distribution supported on a given finite range $[m,M]$. We do not assume that the range $[m,M]$ is known and show that there is a cost for learning this range. Indeed, a new trade-off between distribution-dependent and distribution-free regret bounds arises, which prevents from simultaneously achieving the typical $\ln T$ and $\sqrt{T}$ bounds. For instance, a $\sqrt{T}$ distribution-free regret bound may only be achieved if the distribution-dependent regret bounds are at least of order $\sqrt{T}$. We exhibit a strategy achieving the rates for regret imposed by the new trade-off.

Cite

Text

Hadiji and Stoltz. "Adaptation to the Range in K-Armed Bandits." Journal of Machine Learning Research, 2023.

Markdown

[Hadiji and Stoltz. "Adaptation to the Range in K-Armed Bandits." Journal of Machine Learning Research, 2023.](https://mlanthology.org/jmlr/2023/hadiji2023jmlr-adaptation/)

BibTeX

@article{hadiji2023jmlr-adaptation,
  title     = {{Adaptation to the Range in K-Armed Bandits}},
  author    = {Hadiji, Hédi and Stoltz, Gilles},
  journal   = {Journal of Machine Learning Research},
  year      = {2023},
  pages     = {1-33},
  volume    = {24},
  url       = {https://mlanthology.org/jmlr/2023/hadiji2023jmlr-adaptation/}
}