Learning from Demonstration Using MDP Induced Metrics

Abstract

In this paper we address the problem of learning a policy from demonstration. Assuming that the policy to be learned is the optimal policy for an underlying MDP, we propose a novel way of leveraging the underlying MDP structure in a kernel-based approach. Our proposed approach rests on the insight that the MDP structure can be encapsulated into an adequate state-space metric. In particular we show that, using MDP metrics, we are able to cast the problem of learning from demonstration as a classification problem and attain similar generalization performance as methods based on inverse reinforcement learning at a much lower online computational cost. Our method is also able to attain superior generalization than other supervised learning methods that fail to consider the MDP structure.

Cite

Text

Melo and Lopes. "Learning from Demonstration Using MDP Induced Metrics." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2010. doi:10.1007/978-3-642-15883-4_25

Markdown

[Melo and Lopes. "Learning from Demonstration Using MDP Induced Metrics." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2010.](https://mlanthology.org/ecmlpkdd/2010/melo2010ecmlpkdd-learning/) doi:10.1007/978-3-642-15883-4_25

BibTeX

@inproceedings{melo2010ecmlpkdd-learning,
  title     = {{Learning from Demonstration Using MDP Induced Metrics}},
  author    = {Melo, Francisco S. and Lopes, Manuel},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2010},
  pages     = {385-401},
  doi       = {10.1007/978-3-642-15883-4_25},
  url       = {https://mlanthology.org/ecmlpkdd/2010/melo2010ecmlpkdd-learning/}
}