Generalized Inverse Reinforcement Learning with Linearly Solvable MDP

Abstract

In this paper, we consider a generalized variant of inverse reinforcement learning  (IRL) that estimates both a cost (negative reward) function and a transition probability from observed optimal behavior. In theoretical studies of standard IRL, which estimates only the cost function, it is well known that IRL involves a non-identifiable problem, i.e., the cost function cannot be determined uniquely. This problem has been solved by using a new class of Markov decision process (MDP) called a linearly solvable MDP (LMDP). In this paper, we investigate whether a non-identifiable problem occurs in the generalized variant of IRL (gIRL) using the framework of LMDP and construct a new gIRL method. The contributions of this study are summarized as follows: (i) We point out that gIRL with LMDP suffers from a non-identifiable problem. (ii) We propose a Bayesian method to escape the non-identifiable problem. (iii) We validate the proposed method by performing an experiment on synthetic data and real car probe data.

Cite

Text

Kohjima et al. "Generalized Inverse Reinforcement Learning with Linearly Solvable MDP." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017. doi:10.1007/978-3-319-71246-8_23

Markdown

[Kohjima et al. "Generalized Inverse Reinforcement Learning with Linearly Solvable MDP." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2017.](https://mlanthology.org/ecmlpkdd/2017/kohjima2017ecmlpkdd-generalized/) doi:10.1007/978-3-319-71246-8_23

BibTeX

@inproceedings{kohjima2017ecmlpkdd-generalized,
  title     = {{Generalized Inverse Reinforcement Learning with Linearly Solvable MDP}},
  author    = {Kohjima, Masahiro and Matsubayashi, Tatsushi and Sawada, Hiroshi},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2017},
  pages     = {373-388},
  doi       = {10.1007/978-3-319-71246-8_23},
  url       = {https://mlanthology.org/ecmlpkdd/2017/kohjima2017ecmlpkdd-generalized/}
}