Exploration and Apprenticeship Learning in Reinforcement Learning
Abstract
We consider reinforcement learning in systems with unknown dynamics. Algorithms such as E3 (Kearns and Singh, 2002) learn near-optimal policies by using "exploration policies" to drive the system towards poorly modeled states, so as to encourage exploration. But this makes these algorithms impractical for many systems; for example, on an autonomous helicopter, overly aggressive exploration may well result in a crash. In this paper, we consider the apprenticeship learning setting in which a teacher demonstration of the task is available. We show that, given the initial demonstration, no explicit exploration is necessary, and we can attain near-optimal performance (compared to the teacher) simply by repeatedly executing "exploitation policies" that try to maximize rewards. In finite-state MDPs, our algorithm scales polynomially in the number of states; in continuous-state linear dynamical systems, it scales polynomially in the dimension of the state. These results are proved using a martingale construction over relative losses.
Cite
Text
Abbeel and Ng. "Exploration and Apprenticeship Learning in Reinforcement Learning." International Conference on Machine Learning, 2005. doi:10.1145/1102351.1102352Markdown
[Abbeel and Ng. "Exploration and Apprenticeship Learning in Reinforcement Learning." International Conference on Machine Learning, 2005.](https://mlanthology.org/icml/2005/abbeel2005icml-exploration/) doi:10.1145/1102351.1102352BibTeX
@inproceedings{abbeel2005icml-exploration,
title = {{Exploration and Apprenticeship Learning in Reinforcement Learning}},
author = {Abbeel, Pieter and Ng, Andrew Y.},
booktitle = {International Conference on Machine Learning},
year = {2005},
pages = {1-8},
doi = {10.1145/1102351.1102352},
url = {https://mlanthology.org/icml/2005/abbeel2005icml-exploration/}
}