Inverse Reinforcement Learning from Like-Minded Teachers

Abstract

We study the problem of learning a policy in a Markov decision process (MDP) based on observations of the actions taken by multiple teachers. We assume that the teachers are like-minded in that their reward functions -- while different from each other -- are random perturbations of an underlying reward function. Under this assumption, we demonstrate that inverse reinforcement learning algorithms that satisfy a certain property -- that of matching feature expectations -- yield policies that are approximately optimal with respect to the underlying reward function, and that no algorithm can do better in the worst case. We also show how to efficiently recover the optimal policy when the MDP has one state -- a setting that is akin to multi-armed bandits.

Cite

Text

Noothigattu et al. "Inverse Reinforcement Learning from Like-Minded Teachers." AAAI Conference on Artificial Intelligence, 2021. doi:10.1609/AAAI.V35I10.17110

Markdown

[Noothigattu et al. "Inverse Reinforcement Learning from Like-Minded Teachers." AAAI Conference on Artificial Intelligence, 2021.](https://mlanthology.org/aaai/2021/noothigattu2021aaai-inverse/) doi:10.1609/AAAI.V35I10.17110

BibTeX

@inproceedings{noothigattu2021aaai-inverse,
  title     = {{Inverse Reinforcement Learning from Like-Minded Teachers}},
  author    = {Noothigattu, Ritesh and Yan, Tom and Procaccia, Ariel D.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2021},
  pages     = {9197-9204},
  doi       = {10.1609/AAAI.V35I10.17110},
  url       = {https://mlanthology.org/aaai/2021/noothigattu2021aaai-inverse/}
}