Optimizing the CVaR via Sampling

Abstract

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. We develop a new formula for the gradient of the CVaR in the form of a conditional expectation. Based on this formula, we propose a novel sampling-based estimator for the gradient of the CVaR, in the spirit of the likelihood-ratio method. We analyze the bias of the estimator, and prove the convergence of a corresponding stochastic gradient descent algorithm to a local CVaR optimum. Our method allows to consider CVaR optimization in new domains. As an example, we consider a reinforcement learning application, and learn a risk-sensitive controller for the game of Tetris.

Cite

Text

Tamar et al. "Optimizing the CVaR via Sampling." AAAI Conference on Artificial Intelligence, 2015. doi:10.1609/AAAI.V29I1.9561

Markdown

[Tamar et al. "Optimizing the CVaR via Sampling." AAAI Conference on Artificial Intelligence, 2015.](https://mlanthology.org/aaai/2015/tamar2015aaai-optimizing/) doi:10.1609/AAAI.V29I1.9561

BibTeX

@inproceedings{tamar2015aaai-optimizing,
  title     = {{Optimizing the CVaR via Sampling}},
  author    = {Tamar, Aviv and Glassner, Yonatan and Mannor, Shie},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2015},
  pages     = {2993-2999},
  doi       = {10.1609/AAAI.V29I1.9561},
  url       = {https://mlanthology.org/aaai/2015/tamar2015aaai-optimizing/}
}