Risk-Averse Stochastic Convex Bandit

Abstract

Motivated by applications in clinical trials and finance, we study the problem of online convex optimization (with bandit feedback) where the decision maker is risk-averse. We provide two algorithms to solve this problem. The first one is a descent-type algorithm which is easy to implement. The second algorithm, which combines the ellipsoid method and a center point device, achieves (almost) optimal regret bounds with respect to the number of rounds. To the best of our knowledge this is the first attempt to address risk-aversion in the online convex bandit problem.

Cite

Text

Cardoso and Xu. "Risk-Averse Stochastic Convex Bandit." Artificial Intelligence and Statistics, 2019.

Markdown

[Cardoso and Xu. "Risk-Averse Stochastic Convex Bandit." Artificial Intelligence and Statistics, 2019.](https://mlanthology.org/aistats/2019/cardoso2019aistats-riskaverse/)

BibTeX

@inproceedings{cardoso2019aistats-riskaverse,
  title     = {{Risk-Averse Stochastic Convex Bandit}},
  author    = {Cardoso, Adrian Rivera and Xu, Huan},
  booktitle = {Artificial Intelligence and Statistics},
  year      = {2019},
  pages     = {39-47},
  volume    = {89},
  url       = {https://mlanthology.org/aistats/2019/cardoso2019aistats-riskaverse/}
}