Actor-Critic Algorithms

NeurIPS 1999 pp. 1008-1014

/neurips/1999/konda1999neurips-actorcritic/

Abstract

We propose and analyze a class of actor-critic algorithms for simulation-based optimization of a Markov decision process over a parameterized family of randomized stationary policies. These are two-time-scale algorithms in which the critic uses TD learning with a linear approximation architecture and the actor is updated in an approximate gradient direction based on information pro(cid:173) vided by the critic. We show that the features for the critic should span a subspace prescribed by the choice of parameterization of the actor. We conclude by discussing convergence properties and some open problems.

PDF NeurIPS Semantic Scholar

Cite

Text

Konda and Tsitsiklis. "Actor-Critic Algorithms." Neural Information Processing Systems, 1999.

Markdown

[Konda and Tsitsiklis. "Actor-Critic Algorithms." Neural Information Processing Systems, 1999.](https://mlanthology.org/neurips/1999/konda1999neurips-actorcritic/)

BibTeX

@inproceedings{konda1999neurips-actorcritic,
  title     = {{Actor-Critic Algorithms}},
  author    = {Konda, Vijay R. and Tsitsiklis, John N.},
  booktitle = {Neural Information Processing Systems},
  year      = {1999},
  pages     = {1008-1014},
  url       = {https://mlanthology.org/neurips/1999/konda1999neurips-actorcritic/}
}