Nonparametric Return Distribution Approximation for Reinforcement Learning

Abstract

Standard Reinforcement Learning (RL) aims to optimize decision-making rules in terms of the expected return. However, especially for risk-management purposes, other criteria such as the expected shortfall are sometimes preferred. Here, we describe a method of approximating the distribution of returns, which allows us to derive various kinds of information about the returns. We first show that the Bellman equation, which is a recursive formula for the expected return, can be extended to the cumulative return distribution. Then we derive a nonparametric return distribution estimator with particle smoothing based on this extended Bellman equation. A key aspect of the proposed algorithm is to represent the recursion relation in the extended Bellman equation by a simple replacement procedure of particles associated with a state by using those of the successor state. We show that our algorithm leads to a risk-sensitive RL paradigm. The usefulness of the proposed approach is demonstrated through numerical experiments.

Cite

Text

Morimura et al. "Nonparametric Return Distribution Approximation for Reinforcement Learning." International Conference on Machine Learning, 2010.

Markdown

[Morimura et al. "Nonparametric Return Distribution Approximation for Reinforcement Learning." International Conference on Machine Learning, 2010.](https://mlanthology.org/icml/2010/morimura2010icml-nonparametric/)

BibTeX

@inproceedings{morimura2010icml-nonparametric,
  title     = {{Nonparametric Return Distribution Approximation for Reinforcement Learning}},
  author    = {Morimura, Tetsuro and Sugiyama, Masashi and Kashima, Hisashi and Hachiya, Hirotaka and Tanaka, Toshiyuki},
  booktitle = {International Conference on Machine Learning},
  year      = {2010},
  pages     = {799-806},
  url       = {https://mlanthology.org/icml/2010/morimura2010icml-nonparametric/}
}