Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient
Abstract
This paper addresses a policy optimization task with the conditional value-at-risk (CVaR) objective. We introduce the predictive CVaR policy gradient, a novel approach that seamlessly integrates risk-neutral policy gradient algorithms with minimal modifications. Our method incorporates a reweighting strategy in gradient calculation – individual cost terms are reweighted in proportion to their predicted contribution to the objective. These weights can be easily estimated through a separate learning procedure. We provide theoretical and empirical analyses, demonstrating the validity and effectiveness of our proposed method.
Cite
Text
Kim and Min. "Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient." International Conference on Machine Learning, 2024.Markdown
[Kim and Min. "Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/kim2024icml-risksensitive/)BibTeX
@inproceedings{kim2024icml-risksensitive,
title = {{Risk-Sensitive Policy Optimization via Predictive CVaR Policy Gradient}},
author = {Kim, Ju-Hyun and Min, Seungki},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {24354-24369},
volume = {235},
url = {https://mlanthology.org/icml/2024/kim2024icml-risksensitive/}
}