Reinforcement Learning by Probability Matching

NeurIPS 1995 pp. 1080-1086

/neurips/1995/sabes1995neurips-reinforcement/

Abstract

We present a new algorithm for associative reinforcement learn(cid:173) ing. The algorithm is based upon the idea of matching a network's output probability with a probability distribution derived from the environment's reward signal. This Probability Matching algorithm is shown to perform faster and be less susceptible to local minima than previously existing algorithms. We use Probability Match(cid:173) ing to train mixture of experts networks, an architecture for which other reinforcement learning rules fail to converge reliably on even simple problems. This architecture is particularly well suited for our algorithm as it can compute arbitrarily complex functions yet calculation of the output probability is simple.

PDF NeurIPS Semantic Scholar

Cite

Text

Sabes and Jordan. "Reinforcement Learning by Probability Matching." Neural Information Processing Systems, 1995.

Markdown

[Sabes and Jordan. "Reinforcement Learning by Probability Matching." Neural Information Processing Systems, 1995.](https://mlanthology.org/neurips/1995/sabes1995neurips-reinforcement/)

BibTeX

@inproceedings{sabes1995neurips-reinforcement,
  title     = {{Reinforcement Learning by Probability Matching}},
  author    = {Sabes, Philip N. and Jordan, Michael I.},
  booktitle = {Neural Information Processing Systems},
  year      = {1995},
  pages     = {1080-1086},
  url       = {https://mlanthology.org/neurips/1995/sabes1995neurips-reinforcement/}
}