Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization

Abstract

In reinforcement learning (RL) for sequential decision making under uncertainty, existing methods proposed for considering mean-variance (MV) trade-off suffer from computational difficulties in computation of the gradient of the variance term. In this paper, we aim to obtain MV-efficient policies that achieve Pareto efficiency regarding MV trade-off. To achieve this purpose, we train an agent to maximize the expected quadratic utility function, in which the maximizer corresponds to the Pareto efficient policy. Our approach does not suffer from the computational difficulties because it does not include gradient estimation of the variance. In experiments, we confirm the effectiveness of our proposed methods.

Cite

Text

Kato et al. "Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization." NeurIPS 2021 Workshops: DeepRL, 2021.

Markdown

[Kato et al. "Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization." NeurIPS 2021 Workshops: DeepRL, 2021.](https://mlanthology.org/neuripsw/2021/kato2021neuripsw-meanvariance/)

BibTeX

@inproceedings{kato2021neuripsw-meanvariance,
  title     = {{Mean-Variance Efficient Reinforcement Learning by Expected Quadratic Utility Maximization}},
  author    = {Kato, Masahiro and Nakagawa, Kei and Abe, Kenshi and Morimura, Tetsuro},
  booktitle = {NeurIPS 2021 Workshops: DeepRL},
  year      = {2021},
  url       = {https://mlanthology.org/neuripsw/2021/kato2021neuripsw-meanvariance/}
}