Global Policy Construction in Modular Reinforcement Learning

Abstract

We propose a modular reinforcement learning algorithm which decomposes a Markov decision process into independent modules. Each module is trained using Sarsa(lambda). We introduce three algorithms for forming global policy from modules policies, and demonstrate our results using a 2D grid world.

Cite

Text

Zhang et al. "Global Policy Construction in Modular Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2015. doi:10.1609/AAAI.V29I1.9736

Markdown

[Zhang et al. "Global Policy Construction in Modular Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2015.](https://mlanthology.org/aaai/2015/zhang2015aaai-global/) doi:10.1609/AAAI.V29I1.9736

BibTeX

@inproceedings{zhang2015aaai-global,
  title     = {{Global Policy Construction in Modular Reinforcement Learning}},
  author    = {Zhang, Ruohan and Song, Zhao and Ballard, Dana H.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2015},
  pages     = {4226-4227},
  doi       = {10.1609/AAAI.V29I1.9736},
  url       = {https://mlanthology.org/aaai/2015/zhang2015aaai-global/}
}