Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs
Abstract
In this paper we address reinforcement learning problems with continuous state-action spaces. We propose a new algorithm, fitted natural actor-critic (FNAC), that extends the work in [1] to allow for general function approximation and data reuse. We combine the natural actor-critic architecture [1] with a variant of fitted value iteration using importance sampling. The method thus obtained combines the appealing features of both approaches while overcoming their main weaknesses: the use of a gradient-based actor readily overcomes the difficulties found in regression methods with policy optimization in continuous action-spaces; in turn, the use of a regression-based critic allows for efficient use of data and avoids convergence problems that TD-based critics often exhibit. We establish the convergence of our algorithm and illustrate its application in a simple continuous space, continuous action problem.
Cite
Text
Melo and Lopes. "Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2008. doi:10.1007/978-3-540-87481-2_5Markdown
[Melo and Lopes. "Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2008.](https://mlanthology.org/ecmlpkdd/2008/melo2008ecmlpkdd-fitted/) doi:10.1007/978-3-540-87481-2_5BibTeX
@inproceedings{melo2008ecmlpkdd-fitted,
title = {{Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs}},
author = {Melo, Francisco S. and Lopes, Manuel},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2008},
pages = {66-81},
doi = {10.1007/978-3-540-87481-2_5},
url = {https://mlanthology.org/ecmlpkdd/2008/melo2008ecmlpkdd-fitted/}
}