OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning

Henderson, Peter; Chang, Wei-Di; Bacon, Pierre-Luc; Meger, David; Pineau, Joelle; Precup, Doina

doi:10.1609/AAAI.V32I1.11775

OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning

Peter Henderson, Wei-Di Chang, Pierre-Luc Bacon, David Meger, Joelle Pineau, Doina Precup

AAAI 2018 pp. 3199-3206

doi:10.1609/AAAI.V32I1.11775 /aaai/2018/henderson2018aaai-optiongan/

Abstract

Reinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward function can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectories arising from a diverse set of underlying reward functions rather than a single one. Thus, in inverse reinforcement learning, it is useful to consider such a decomposition. The options framework in reinforcement learning is specifically designed to decompose policies in a similar light. We therefore extend the options framework and propose a method to simultaneously recover reward options in addition to policy options. We leverage adversarial methods to learn joint reward-policy options using only observed expert states. We show that this approach works well in both simple and complex continuous control tasks and shows significant performance increases in one-shot transfer learning.

PDF AAAI Semantic Scholar

Cite

Text

Henderson et al. "OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2018. doi:10.1609/AAAI.V32I1.11775

Markdown

[Henderson et al. "OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2018.](https://mlanthology.org/aaai/2018/henderson2018aaai-optiongan/) doi:10.1609/AAAI.V32I1.11775

BibTeX

@inproceedings{henderson2018aaai-optiongan,
  title     = {{OptionGAN: Learning Joint Reward-Policy Options Using Generative Adversarial Inverse Reinforcement Learning}},
  author    = {Henderson, Peter and Chang, Wei-Di and Bacon, Pierre-Luc and Meger, David and Pineau, Joelle and Precup, Doina},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2018},
  pages     = {3199-3206},
  doi       = {10.1609/AAAI.V32I1.11775},
  url       = {https://mlanthology.org/aaai/2018/henderson2018aaai-optiongan/}
}