Using Inaccurate Models in Reinforcement Learning

Abstract

In the model-based policy search approach to reinforcement learning (RL), policies are found using a model (or "simulator") of the Markov decision process. However, for high-dimensional continuous-state tasks, it can be extremely difficult to build an accurate model, and thus often the algorithm returns a policy that works in simulation but not in real-life. The other extreme, model-free RL, tends to require infeasibly large numbers of real-life trials. In this paper, we present a hybrid algorithm that requires only an approximate model, and only a small number of real-life trials. The key idea is to successively "ground" the policy evaluations using real-life trials, but to rely on the approximate model to suggest local changes. Our theoretical results show that this algorithm achieves near-optimal performance in the real system, even when the model is only approximate. Empirical results also demonstrate that---when given only a crude model and a small number of real-life trials---our algorithm can obtain near-optimal performance in the real system.

Cite

Text

Abbeel et al. "Using Inaccurate Models in Reinforcement Learning." International Conference on Machine Learning, 2006. doi:10.1145/1143844.1143845

Markdown

[Abbeel et al. "Using Inaccurate Models in Reinforcement Learning." International Conference on Machine Learning, 2006.](https://mlanthology.org/icml/2006/abbeel2006icml-using/) doi:10.1145/1143844.1143845

BibTeX

@inproceedings{abbeel2006icml-using,
  title     = {{Using Inaccurate Models in Reinforcement Learning}},
  author    = {Abbeel, Pieter and Quigley, Morgan and Ng, Andrew Y.},
  booktitle = {International Conference on Machine Learning},
  year      = {2006},
  pages     = {1-8},
  doi       = {10.1145/1143844.1143845},
  url       = {https://mlanthology.org/icml/2006/abbeel2006icml-using/}
}