Robust Risk-Averse Stochastic Multi-Armed Bandits

Abstract

We study a variant of the standard stochastic multi-armed bandit problem when one is not interested in the arm with the best mean, but instead in the arm maximizing some coherent risk measure criterion. Further, we are studying the deviations of the regret instead of the less informative expected regret. We provide an algorithm, called RA-UCB to solve this problem, together with a high probability bound on its regret.

Cite

Text

Maillard. "Robust Risk-Averse Stochastic Multi-Armed Bandits." International Conference on Algorithmic Learning Theory, 2013. doi:10.1007/978-3-642-40935-6_16

Markdown

[Maillard. "Robust Risk-Averse Stochastic Multi-Armed Bandits." International Conference on Algorithmic Learning Theory, 2013.](https://mlanthology.org/alt/2013/maillard2013alt-robust/) doi:10.1007/978-3-642-40935-6_16

BibTeX

@inproceedings{maillard2013alt-robust,
  title     = {{Robust Risk-Averse Stochastic Multi-Armed Bandits}},
  author    = {Maillard, Odalric-Ambrym},
  booktitle = {International Conference on Algorithmic Learning Theory},
  year      = {2013},
  pages     = {218-233},
  doi       = {10.1007/978-3-642-40935-6_16},
  url       = {https://mlanthology.org/alt/2013/maillard2013alt-robust/}
}