Exploration and Apprenticeship Learning in Reinforcement Learning

Abstract

We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E3 (Kearns and Singh, 2002) learn near-optimal policies by using "exploration policies" to drive the system towards poorly modeled states, so as to encourage exploration. But this makes these algorithms impractical for many systems; for example, on an autonomous helicopter, overly aggressive exploration may well result in a crash. In this paper, we consider the apprenticeship learning setting in which a teacher demonstration of the task is available. We show that, given the initial demonstration, no explicit exploration is necessary, and we can attain near-optimal performance (compared to the teacher) simply by repeatedly executing "exploitation policies" that try to maximize rewards. In finite-state MDPs, our algorithm scales polynomially in the number of states; in continuous-state linear dynamical systems, it scales polynomially in the dimension of the state. These results are proved using a martingale construction over relative losses.

Cite

Text

Abbeel and Ng. "Exploration and Apprenticeship Learning in Reinforcement Learning." International Conference on Machine Learning, 2005. doi:10.1145/1102351.1102352

Markdown

[Abbeel and Ng. "Exploration and Apprenticeship Learning in Reinforcement Learning." International Conference on Machine Learning, 2005.](https://mlanthology.org/icml/2005/abbeel2005icml-exploration/) doi:10.1145/1102351.1102352

BibTeX

@inproceedings{abbeel2005icml-exploration,
  title     = {{Exploration and Apprenticeship Learning in Reinforcement Learning}},
  author    = {Abbeel, Pieter and Ng, Andrew Y.},
  booktitle = {International Conference on Machine Learning},
  year      = {2005},
  pages     = {1-8},
  doi       = {10.1145/1102351.1102352},
  url       = {https://mlanthology.org/icml/2005/abbeel2005icml-exploration/}
}