Risk-Aversion in Multi-Armed Bandits
Abstract
In stochastic multi--armed bandits the objective is to solve the exploration--exploitation dilemma and ultimately maximize the expected reward. Nonetheless, in many practical problems, maximizing the expected reward is not the most desirable objective. In this paper, we introduce a novel setting based on the principle of risk--aversion where the objective is to compete against the arm with the best risk--return trade--off. This setting proves to be intrinsically more difficult than the standard multi-arm bandit setting due in part to an exploration risk which introduces a regret associated to the variability of an algorithm. Using variance as a measure of risk, we introduce two new algorithms, we investigate their theoretical guarantees, and we report preliminary empirical results.
Cite
Text
Sani et al. "Risk-Aversion in Multi-Armed Bandits." Neural Information Processing Systems, 2012.Markdown
[Sani et al. "Risk-Aversion in Multi-Armed Bandits." Neural Information Processing Systems, 2012.](https://mlanthology.org/neurips/2012/sani2012neurips-riskaversion/)BibTeX
@inproceedings{sani2012neurips-riskaversion,
title = {{Risk-Aversion in Multi-Armed Bandits}},
author = {Sani, Amir and Lazaric, Alessandro and Munos, Rémi},
booktitle = {Neural Information Processing Systems},
year = {2012},
pages = {3275-3283},
url = {https://mlanthology.org/neurips/2012/sani2012neurips-riskaversion/}
}