Reinforcement Learning Through Global Stochastic Search in N-MDPs
Abstract
Reinforcement Learning (RL) in either fully or partially observable domains usually poses a requirement on the knowledge representation in order to be sound: the underlying stochastic process must be Markovian. In many applications, including those involving interactions between multiple agents (e.g., humans and robots), sources of uncertainty affect rewards and transition dynamics in such a way that a Markovian representation would be computationally very expensive. An alternative formulation of the decision problem involves partially specified behaviors with choice points. While this reduces the complexity of the policy space that must be explored - something that is crucial for realistic autonomous agents that must bound search time - it does render the domain Non-Markovian. In this paper, we present a novel algorithm for reinforcement learning in Non-Markovian domains. Our algorithm, Stochastic Search Monte Carlo, performs a global stochastic search in policy space, shaping the distribution from which the next policy is selected by estimating an upper bound on the value of each action. We experimentally show how, in challenging domains for RL, high-level decisions in Non-Markovian processes can lead to a behavior that is at least as good as the one learned by traditional algorithms, and can be achieved with significantly fewer samples.
Cite
Text
Leonetti et al. "Reinforcement Learning Through Global Stochastic Search in N-MDPs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2011. doi:10.1007/978-3-642-23783-6_21Markdown
[Leonetti et al. "Reinforcement Learning Through Global Stochastic Search in N-MDPs." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2011.](https://mlanthology.org/ecmlpkdd/2011/leonetti2011ecmlpkdd-reinforcement/) doi:10.1007/978-3-642-23783-6_21BibTeX
@inproceedings{leonetti2011ecmlpkdd-reinforcement,
title = {{Reinforcement Learning Through Global Stochastic Search in N-MDPs}},
author = {Leonetti, Matteo and Iocchi, Luca and Ramamoorthy, Subramanian},
booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
year = {2011},
pages = {326-340},
doi = {10.1007/978-3-642-23783-6_21},
url = {https://mlanthology.org/ecmlpkdd/2011/leonetti2011ecmlpkdd-reinforcement/}
}