Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward

Kimura, Hajime; Yamamura, Masayuki; Kobayashi, Shigenobu

doi:10.1016/B978-1-55860-377-6.50044-X

Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward

Hajime Kimura, Masayuki Yamamura, Shigenobu Kobayashi

ICML 1995 pp. 295-303

doi:10.1016/B978-1-55860-377-6.50044-X /icml/1995/kimura1995icml-reinforcement/

Abstract

Reinforcement learning systems are often required to find not deterministic policies, but stochastic ones. They are also required to gain more reward while learning. Q-learning has not been designed for stochastic policies, and does not guarantee rational behavior on the halfway of learning. This paper presents a new reinforcement learning approach based on a simple credit-assignment for finding memory-less policies. It satisfies the above requirements with considering the policy and the exploration strategy identically. The mathematical analysis shows the proposed method is a stochastic gradient ascent on discounted reward in Markov decision processes (MDPs), and is related to the average-reward framework. The analysis assures that the proposed method can be extended to continuous environments. We also investigate its behavior in comparison with Q-learning on a small MDP example and a non-Markovian one.

ICML Semantic Scholar

Cite

Text

Kimura et al. "Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward." International Conference on Machine Learning, 1995. doi:10.1016/B978-1-55860-377-6.50044-X

Markdown

[Kimura et al. "Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward." International Conference on Machine Learning, 1995.](https://mlanthology.org/icml/1995/kimura1995icml-reinforcement/) doi:10.1016/B978-1-55860-377-6.50044-X

BibTeX

@inproceedings{kimura1995icml-reinforcement,
  title     = {{Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward}},
  author    = {Kimura, Hajime and Yamamura, Masayuki and Kobayashi, Shigenobu},
  booktitle = {International Conference on Machine Learning},
  year      = {1995},
  pages     = {295-303},
  doi       = {10.1016/B978-1-55860-377-6.50044-X},
  url       = {https://mlanthology.org/icml/1995/kimura1995icml-reinforcement/}
}