Learning from Demonstration

Abstract

By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely at(cid:173) tempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or demonstrations of other humans. For learning control, this paper investigates how learning from demonstration can be applied in the context of reinforcement learning. We consider priming the Q-function, the value function, the policy, and the model of the task dynamics as possible areas where demonstrations can speed up learning. In general nonlinear learning problems, only model-based rein(cid:173) forcement learning shows significant speed-up after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing on a complex anthropomorphic robot arm, we demonstrate that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems. Using the suggested methods, the robot learns pole balancing in just a single trial after a 30 second long demonstration of the human instructor.

Cite

Text

Schaal. "Learning from Demonstration." Neural Information Processing Systems, 1996.

Markdown

[Schaal. "Learning from Demonstration." Neural Information Processing Systems, 1996.](https://mlanthology.org/neurips/1996/schaal1996neurips-learning/)

BibTeX

@inproceedings{schaal1996neurips-learning,
  title     = {{Learning from Demonstration}},
  author    = {Schaal, Stefan},
  booktitle = {Neural Information Processing Systems},
  year      = {1996},
  pages     = {1040-1046},
  url       = {https://mlanthology.org/neurips/1996/schaal1996neurips-learning/}
}