Adaptive Bandits: Towards the Best History-Dependent Strategy
Abstract
We consider multi-armed bandit games with possibly adaptive opponents. We introduce models $\Theta$ of constraints based on equivalence classes on the common history (information shared by the player and the opponent) which define two learning scenarios: (1) The opponent is constrained, i.e. he provides rewards that are stochastic functions of equivalence classes defined by some model $\theta^* \in \Theta$. The regret is measured with respect to (w.r.t.) the best history-dependent strategy. (2) The opponent is arbitrary and we measure the regret w.r.t. the best strategy among all mappings from classes to actions (i.e. the best history-class-based strategy) for the best model in $\Theta$. This allows to model opponents (case 1) or strategies (case 2) which handles finite memory, periodicity, standard stochastic bandits and other situations. When $\Theta=\{ \theta \}$, i.e. only one model is considered, we derive tractable algorithms achieving a tight regret (at time $T$) bounded by $\tilde{O}(\sqrt{TAC})$, where $C$ is the number of classes of $\theta$. Now, when many models are available, all known algorithms achieving a nice regret $O(\sqrt{T})$ are unfortunately not tractable and scale poorly with the number of models $|\Theta|$. Our contribution here is to provide tractable algorithms with regret bounded by $T^{2/3}C^{1/3}\log(|\Theta|)^{1/2}$.
Cite
Text
Odalric and Munos. "Adaptive Bandits: Towards the Best History-Dependent Strategy." Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011.Markdown
[Odalric and Munos. "Adaptive Bandits: Towards the Best History-Dependent Strategy." Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011.](https://mlanthology.org/aistats/2011/odalric2011aistats-adaptive/)BibTeX
@inproceedings{odalric2011aistats-adaptive,
title = {{Adaptive Bandits: Towards the Best History-Dependent Strategy}},
author = {Odalric, Maillard and Munos, Rémi},
booktitle = {Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics},
year = {2011},
pages = {570-578},
volume = {15},
url = {https://mlanthology.org/aistats/2011/odalric2011aistats-adaptive/}
}