Policy Gradient in Lipschitz Markov Decision Processes
Abstract
This paper is about the exploitation of Lipschitz continuity properties for Markov Decision Processes to safely speed up policy-gradient algorithms. Starting from assumptions about the Lipschitz continuity of the state-transition model, the reward function, and the policies considered in the learning process, we show that both the expected return of a policy and its gradient are Lipschitz continuous w.r.t. policy parameters. By leveraging such properties, we define policy-parameter updates that guarantee a performance improvement at each iteration. The proposed methods are empirically evaluated and compared to other related approaches using different configurations of three popular control scenarios: the linear quadratic regulator, the mass-spring-damper system and the ship-steering control.
Cite
Text
Pirotta et al. "Policy Gradient in Lipschitz Markov Decision Processes." Machine Learning, 2015. doi:10.1007/S10994-015-5484-1Markdown
[Pirotta et al. "Policy Gradient in Lipschitz Markov Decision Processes." Machine Learning, 2015.](https://mlanthology.org/mlj/2015/pirotta2015mlj-policy/) doi:10.1007/S10994-015-5484-1BibTeX
@article{pirotta2015mlj-policy,
title = {{Policy Gradient in Lipschitz Markov Decision Processes}},
author = {Pirotta, Matteo and Restelli, Marcello and Bascetta, Luca},
journal = {Machine Learning},
year = {2015},
pages = {255-283},
doi = {10.1007/S10994-015-5484-1},
volume = {100},
url = {https://mlanthology.org/mlj/2015/pirotta2015mlj-policy/}
}