Generative Adversarial Imitation Learning

Abstract

Consider learning a policy from example expert behavior, without interaction with the expert or access to a reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

Cite

Text

Ho and Ermon. "Generative Adversarial Imitation Learning." Neural Information Processing Systems, 2016.

Markdown

[Ho and Ermon. "Generative Adversarial Imitation Learning." Neural Information Processing Systems, 2016.](https://mlanthology.org/neurips/2016/ho2016neurips-generative/)

BibTeX

@inproceedings{ho2016neurips-generative,
  title     = {{Generative Adversarial Imitation Learning}},
  author    = {Ho, Jonathan and Ermon, Stefano},
  booktitle = {Neural Information Processing Systems},
  year      = {2016},
  pages     = {4565-4573},
  url       = {https://mlanthology.org/neurips/2016/ho2016neurips-generative/}
}