Using Transitional Proximity for Faster Reinforcement Learning

Abstract

Why does reinforcement learning take so long? One major reason is that reward spreads too slowly through the agent's policy. When an agent receives reward, existent methods only pass the reward back to internal states along the current path. Normally there are many possible paths to goal states however, and the agent must follow each of them successfully one or more times in order to complete learning. Our algorithm learns the transitions between internal states so that rewards may be passed not only along the one path taken this trial, but also passed back through all transitions learned during previous trials. States closer to the current state receive correspondingly more of the current reward. We call this distance between states in transition space transitional proximity. We explain the basics of reinforcement learning, Q-learning and Kohonen Networks, and then formally develop Transitional Proximity Q-learning in this framework. Experimental results confirm faster learning and much quicker convergence to optimal policies.

Cite

Text

McCallum. "Using Transitional Proximity for Faster Reinforcement Learning." International Conference on Machine Learning, 1992. doi:10.1016/B978-1-55860-247-2.50045-0

Markdown

[McCallum. "Using Transitional Proximity for Faster Reinforcement Learning." International Conference on Machine Learning, 1992.](https://mlanthology.org/icml/1992/mccallum1992icml-using/) doi:10.1016/B978-1-55860-247-2.50045-0

BibTeX

@inproceedings{mccallum1992icml-using,
  title     = {{Using Transitional Proximity for Faster Reinforcement Learning}},
  author    = {McCallum, R. Andrew},
  booktitle = {International Conference on Machine Learning},
  year      = {1992},
  pages     = {316-321},
  doi       = {10.1016/B978-1-55860-247-2.50045-0},
  url       = {https://mlanthology.org/icml/1992/mccallum1992icml-using/}
}