Learning from Demonstration
Abstract
By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely at(cid:173) tempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or demonstrations of other humans. For learning control, this paper investigates how learning from demonstration can be applied in the context of reinforcement learning. We consider priming the Q-function, the value function, the policy, and the model of the task dynamics as possible areas where demonstrations can speed up learning. In general nonlinear learning problems, only model-based rein(cid:173) forcement learning shows significant speed-up after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing on a complex anthropomorphic robot arm, we demonstrate that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems. Using the suggested methods, the robot learns pole balancing in just a single trial after a 30 second long demonstration of the human instructor.
Cite
Text
Schaal. "Learning from Demonstration." Neural Information Processing Systems, 1996.Markdown
[Schaal. "Learning from Demonstration." Neural Information Processing Systems, 1996.](https://mlanthology.org/neurips/1996/schaal1996neurips-learning/)BibTeX
@inproceedings{schaal1996neurips-learning,
title = {{Learning from Demonstration}},
author = {Schaal, Stefan},
booktitle = {Neural Information Processing Systems},
year = {1996},
pages = {1040-1046},
url = {https://mlanthology.org/neurips/1996/schaal1996neurips-learning/}
}