Projected Natural Actor-Critic

Abstract

Natural actor-critics are a popular class of policy search algorithms for finding locally optimal policies for Markov decision processes. In this paper we address a drawback of natural actor-critics that limits their real-world applicability - their lack of safety guarantees. We present a principled algorithm for performing natural gradient descent over a constrained domain. In the context of reinforcement learning, this allows for natural actor-critic algorithms that are guaranteed to remain within a known safe region of policy space. While deriving our class of constrained natural actor-critic algorithms, which we call Projected Natural Actor-Critics (PNACs), we also elucidate the relationship between natural gradient descent and mirror descent.

Cite

Text

Thomas et al. "Projected Natural Actor-Critic." Neural Information Processing Systems, 2013.

Markdown

[Thomas et al. "Projected Natural Actor-Critic." Neural Information Processing Systems, 2013.](https://mlanthology.org/neurips/2013/thomas2013neurips-projected/)

BibTeX

@inproceedings{thomas2013neurips-projected,
  title     = {{Projected Natural Actor-Critic}},
  author    = {Thomas, Philip S. and Dabney, William C and Giguere, Stephen and Mahadevan, Sridhar},
  booktitle = {Neural Information Processing Systems},
  year      = {2013},
  pages     = {2337-2345},
  url       = {https://mlanthology.org/neurips/2013/thomas2013neurips-projected/}
}