Robust Risk-Averse Stochastic Multi-Armed Bandits

Maillard, Odalric-Ambrym

doi:10.1007/978-3-642-40935-6_16

Robust Risk-Averse Stochastic Multi-Armed Bandits

Odalric-Ambrym Maillard

ALT 2013 pp. 218-233

doi:10.1007/978-3-642-40935-6_16 /alt/2013/maillard2013alt-robust/

Abstract

We study a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in the arm maximizing some coherent risk measure criterion. Further, we are studying the deviations of the regret instead of the less informative expected regret. We provide an algorithm, called RA-UCB to solve this problem, together with a high probability bound on its regret.

PDF ALT Semantic Scholar

Cite

Text

Maillard. "Robust Risk-Averse Stochastic Multi-Armed Bandits." International Conference on Algorithmic Learning Theory, 2013. doi:10.1007/978-3-642-40935-6_16

Markdown

[Maillard. "Robust Risk-Averse Stochastic Multi-Armed Bandits." International Conference on Algorithmic Learning Theory, 2013.](https://mlanthology.org/alt/2013/maillard2013alt-robust/) doi:10.1007/978-3-642-40935-6_16

BibTeX

@inproceedings{maillard2013alt-robust,
  title     = {{Robust Risk-Averse Stochastic Multi-Armed Bandits}},
  author    = {Maillard, Odalric-Ambrym},
  booktitle = {International Conference on Algorithmic Learning Theory},
  year      = {2013},
  pages     = {218-233},
  doi       = {10.1007/978-3-642-40935-6_16},
  url       = {https://mlanthology.org/alt/2013/maillard2013alt-robust/}
}