An Alternative to Variance: Gini Deviation for Risk-Averse Policy Gradient

Abstract

Restricting the variance of a policy’s return is a popular choice in risk-averse Reinforcement Learning (RL) due to its clear mathematical definition and easy interpretability. Traditional methods directly restrict the total return variance. Recent methods restrict the per-step reward variance as a proxy. We thoroughly examine the limitations of these variance-based methods, such as sensitivity to numerical scale and hindering of policy learning, and propose to use an alternative risk measure, Gini deviation, as a substitute. We study various properties of this new risk measure and derive a policy gradient algorithm to minimize it. Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.

Cite

Text

Luo et al. "An Alternative to Variance: Gini Deviation for Risk-Averse Policy Gradient." Neural Information Processing Systems, 2023.

Markdown

[Luo et al. "An Alternative to Variance: Gini Deviation for Risk-Averse Policy Gradient." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/luo2023neurips-alternative/)

BibTeX

@inproceedings{luo2023neurips-alternative,
  title     = {{An Alternative to Variance: Gini Deviation for Risk-Averse Policy Gradient}},
  author    = {Luo, Yudong and Liu, Guiliang and Poupart, Pascal and Pan, Yangchen},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/luo2023neurips-alternative/}
}