Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient

Abstract

This paper addresses a policy optimization task with the conditional value-at-risk (CVaR) objective. We introduce the predictive CVaR policy gradient, a novel approach that seamlessly integrates risk-neutral policy gradient algorithms with minimal modifications. Our method incorporates a reweighting strategy in gradient calculation – individual cost terms are reweighted in proportion to their predicted contribution to the objective. These weights can be easily estimated through a separate learning procedure. We provide theoretical and empirical analyses, demonstrating the validity and effectiveness of our proposed method.

Cite

Text

Kim and Min. "Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient." International Conference on Machine Learning, 2024.

Markdown

[Kim and Min. "Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/kim2024icml-risksensitive/)

BibTeX

@inproceedings{kim2024icml-risksensitive,
  title     = {{Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient}},
  author    = {Kim, Ju-Hyun and Min, Seungki},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {24354-24369},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/kim2024icml-risksensitive/}
}