Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs

Abstract

In this paper we address reinforcement learning problems with continuous state-action spaces. We propose a new algorithm, fitted natural actor-critic (FNAC), that extends the work in [1] to allow for general function approximation and data reuse. We combine the natural actor-critic architecture [1] with a variant of fitted value iteration using importance sampling. The method thus obtained combines the appealing features of both approaches while overcoming their main weaknesses: the use of a gradient-based actor readily overcomes the difficulties found in regression methods with policy optimization in continuous action-spaces; in turn, the use of a regression-based critic allows for efficient use of data and avoids convergence problems that TD-based critics often exhibit. We establish the convergence of our algorithm and illustrate its application in a simple continuous space, continuous action problem.

Cite

Text

Melo and Lopes. "Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2008. doi:10.1007/978-3-540-87481-2_5

Markdown

[Melo and Lopes. "Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2008.](https://mlanthology.org/ecmlpkdd/2008/melo2008ecmlpkdd-fitted/) doi:10.1007/978-3-540-87481-2_5

BibTeX

@inproceedings{melo2008ecmlpkdd-fitted,
  title     = {{Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs}},
  author    = {Melo, Francisco S. and Lopes, Manuel},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2008},
  pages     = {66-81},
  doi       = {10.1007/978-3-540-87481-2_5},
  url       = {https://mlanthology.org/ecmlpkdd/2008/melo2008ecmlpkdd-fitted/}
}