Contextual Policies Enable Efficient and Interpretable Inverse Reinforcement Learning for Populations

Abstract

Inverse reinforcement learning (IRL) methods learn a reward function from expert demonstrations such as human behavior, offering a practical solution for crafting reward functions for complex environments. However, IRL is computationally expensive when applied to large populations of demonstrators, as existing IRL algorithms require solving a separate reinforcement learning (RL) problem for each individual. We propose a new IRL approach that relies on contextual RL, where an optimal policy is learned for multiple contexts. We first learn a contextual policy that provides the RL solution directly for a parametric family of reward functions, and then re-use it for IRL on each individual within the population. We motivate our method within the scenario of AI-driven playtesting of videogames, and focus on an interpretable family of reward functions. We evaluate the method on a navigation task and the battle arena game Derk, where it successfully recovers distinct player reward preferences from a simulated population and provides substantial time savings compared to a solid baseline of adversarial IRL.

Cite

Text

Tanskanen et al. "Contextual Policies Enable Efficient and Interpretable Inverse Reinforcement Learning for Populations." Transactions on Machine Learning Research, 2024.

Markdown

[Tanskanen et al. "Contextual Policies Enable Efficient and Interpretable Inverse Reinforcement Learning for Populations." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/tanskanen2024tmlr-contextual/)

BibTeX

@article{tanskanen2024tmlr-contextual,
  title     = {{Contextual Policies Enable Efficient and Interpretable Inverse Reinforcement Learning for Populations}},
  author    = {Tanskanen, Ville and Rajani, Chang and Hämäläinen, Perttu and Guckelsberger, Christian and Klami, Arto},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/tanskanen2024tmlr-contextual/}
}