Risk-Aversion in Multi-Armed Bandits

Abstract

In stochastic multi--armed bandits the objective is to solve the exploration--exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk--aversion where the objective is to compete against the arm with the best risk--return trade--off. This setting proves to be intrinsically more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we introduce two new algorithms, we investigate their theoretical guarantees, and we report preliminary empirical results.

Cite

Text

Sani et al. "Risk-Aversion in Multi-Armed Bandits." Neural Information Processing Systems, 2012.

Markdown

[Sani et al. "Risk-Aversion in Multi-Armed Bandits." Neural Information Processing Systems, 2012.](https://mlanthology.org/neurips/2012/sani2012neurips-riskaversion/)

BibTeX

@inproceedings{sani2012neurips-riskaversion,
  title     = {{Risk-Aversion in Multi-Armed Bandits}},
  author    = {Sani, Amir and Lazaric, Alessandro and Munos, Rémi},
  booktitle = {Neural Information Processing Systems},
  year      = {2012},
  pages     = {3275-3283},
  url       = {https://mlanthology.org/neurips/2012/sani2012neurips-riskaversion/}
}