Adaptive Bandits: Towards the Best History-Dependent Strategy

Odalric, Maillard; Munos, Rémi

Adaptive Bandits: Towards the Best History-Dependent Strategy

AISTATS 2011 pp. 570-578

/aistats/2011/odalric2011aistats-adaptive/

Abstract

We consider multi-armed bandit games with possibly adaptive opponents. We introduce models $\Theta$ of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which define two learning scenarios: (1) The opponent is constrained, i.e. he provides rewards that are stochastic functions of equivalence classes defined by some model $\theta^* \in \Theta$. The regret is measured with respect to (w.r.t.) the best history-dependent strategy. (2) The opponent is arbitrary and we measure the regret w.r.t. the best strategy among all mappings from classes to actions (i.e. the best history-class-based strategy) for the best model in $\Theta$. This allows to model opponents (case 1) or strategies (case 2) which handles finite memory, periodicity, standard stochastic bandits and other situations. When $\Theta=\{ \theta \}$, i.e. only one model is considered, we derive tractable algorithms achieving a tight regret (at time $T$) bounded by $\tilde{O}(\sqrt{TAC})$, where $C$ is the number of classes of $\theta$. Now, when many models are available, all known algorithms achieving a nice regret $O(\sqrt{T})$ are unfortunately not tractable and scale poorly with the number of models $|\Theta|$. Our contribution here is to provide tractable algorithms with regret bounded by $T^{2/3}C^{1/3}\log(|\Theta|)^{1/2}$.

PDF AISTATS Semantic Scholar

Cite

Text

Odalric and Munos. "Adaptive Bandits: Towards the Best History-Dependent Strategy." Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011.

Markdown

[Odalric and Munos. "Adaptive Bandits: Towards the Best History-Dependent Strategy." Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011.](https://mlanthology.org/aistats/2011/odalric2011aistats-adaptive/)

BibTeX

@inproceedings{odalric2011aistats-adaptive,
  title     = {{Adaptive Bandits: Towards the Best History-Dependent Strategy}},
  author    = {Odalric, Maillard and Munos, Rémi},
  booktitle = {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics},
  year      = {2011},
  pages     = {570-578},
  volume    = {15},
  url       = {https://mlanthology.org/aistats/2011/odalric2011aistats-adaptive/}
}