Inverse Reinforcement Learning with Multiple Planning Horizons

Abstract

In this work, we study an inverse reinforcement learning (IRL) problem where the experts are planning \textit{under a shared reward function but with different, unknown planning horizons}. Without the knowledge of discount factors, the reward function has a larger feasible solution set, which makes it harder to identify a reward function. To overcome this challenge, we develop an algorithm that in practice, can learn a reward function similar to the true reward function. We give an empirical characterization of the identifiability and generalizability of the feasible set of the reward function.

Cite

Text

Yao et al. "Inverse Reinforcement Learning with Multiple Planning Horizons." NeurIPS 2023 Workshops: GenPlan, 2023.

Markdown

[Yao et al. "Inverse Reinforcement Learning with Multiple Planning Horizons." NeurIPS 2023 Workshops: GenPlan, 2023.](https://mlanthology.org/neuripsw/2023/yao2023neuripsw-inverse/)

BibTeX

@inproceedings{yao2023neuripsw-inverse,
  title     = {{Inverse Reinforcement Learning with Multiple Planning Horizons}},
  author    = {Yao, Jiayu and Doshi-Velez, Finale and Engelhardt, Barbara},
  booktitle = {NeurIPS 2023 Workshops: GenPlan},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/yao2023neuripsw-inverse/}
}