Using Inaccurate Models in Reinforcement Learning
Abstract
In the model-based policy search approach to reinforcement learning (RL), policies are found using a model (or "simulator") of the Markov decision process. However, for high-dimensional continuous-state tasks, it can be extremely difficult to build an accurate model, and thus often the algorithm returns a policy that works in simulation but not in real-life. The other extreme, model-free RL, tends to require infeasibly large numbers of real-life trials. In this paper, we present a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials. The key idea is to successively "ground" the policy evaluations using real-life trials, but to rely on the approximate model to suggest local changes. Our theoretical results show that this algorithm achieves near-optimal performance in the real system, even when the model is only approximate. Empirical results also demonstrate that---when given only a crude model and a small number of real-life trials---our algorithm can obtain near-optimal performance in the real system.
Cite
Text
Abbeel et al. "Using Inaccurate Models in Reinforcement Learning." International Conference on Machine Learning, 2006. doi:10.1145/1143844.1143845Markdown
[Abbeel et al. "Using Inaccurate Models in Reinforcement Learning." International Conference on Machine Learning, 2006.](https://mlanthology.org/icml/2006/abbeel2006icml-using/) doi:10.1145/1143844.1143845BibTeX
@inproceedings{abbeel2006icml-using,
title = {{Using Inaccurate Models in Reinforcement Learning}},
author = {Abbeel, Pieter and Quigley, Morgan and Ng, Andrew Y.},
booktitle = {International Conference on Machine Learning},
year = {2006},
pages = {1-8},
doi = {10.1145/1143844.1143845},
url = {https://mlanthology.org/icml/2006/abbeel2006icml-using/}
}