Policy Gradient Coagent Networks
Abstract
We present a novel class of actor-critic algorithms for actors consisting of sets of interacting modules. We present, analyze theoretically, and empirically evaluate an update rule for each module, which requires only local information: the module's input, output, and the TD error broadcast by a critic. Such updates are necessary when computation of compatible features becomes prohibitively difficult and are also desirable to increase the biological plausibility of reinforcement learning methods.
Cite
Text
Thomas. "Policy Gradient Coagent Networks." Neural Information Processing Systems, 2011.Markdown
[Thomas. "Policy Gradient Coagent Networks." Neural Information Processing Systems, 2011.](https://mlanthology.org/neurips/2011/thomas2011neurips-policy/)BibTeX
@inproceedings{thomas2011neurips-policy,
title = {{Policy Gradient Coagent Networks}},
author = {Thomas, Philip S.},
booktitle = {Neural Information Processing Systems},
year = {2011},
pages = {1944-1952},
url = {https://mlanthology.org/neurips/2011/thomas2011neurips-policy/}
}