Approximate Relative Value Learning for Average-Reward Continuous State MDPs

Abstract

In this paper, we propose an approximate relative value learning (ARVL) algorithm for non- parametric MDPs with continuous state space and finite actions and average reward criterion. It is a sampling based algorithm combined with kernel density estimation and function approximation via nearest neighbors. The theoretical analysis is done via a random contraction operator framework and stochastic dominance argument. This is the first such algorithm for continuous state space MDPs with average re- ward criteria with these provable properties which does not require any discretization of state space as far as we know. We then evaluate the proposed algorithm on a benchmark problem numerically.

Cite

Text

Sharma et al. "Approximate Relative Value Learning for Average-Reward Continuous State MDPs." Uncertainty in Artificial Intelligence, 2019.

Markdown

[Sharma et al. "Approximate Relative Value Learning for Average-Reward Continuous State MDPs." Uncertainty in Artificial Intelligence, 2019.](https://mlanthology.org/uai/2019/sharma2019uai-approximate/)

BibTeX

@inproceedings{sharma2019uai-approximate,
  title     = {{Approximate Relative Value Learning for Average-Reward Continuous State MDPs}},
  author    = {Sharma, Hiteshi and Jafarnia-Jahromi, Mehdi and Jain, Rahul},
  booktitle = {Uncertainty in Artificial Intelligence},
  year      = {2019},
  pages     = {956-964},
  volume    = {115},
  url       = {https://mlanthology.org/uai/2019/sharma2019uai-approximate/}
}