Maximum Entropy Inverse Reinforcement Learning

Ziebart, Brian D.; Maas, Andrew L.; Bagnell, J. Andrew; Dey, Anind K.

doi:10.1184/r1/6555512

Maximum Entropy Inverse Reinforcement Learning

Brian D. Ziebart, Andrew L. Maas, J. Andrew Bagnell, Anind K. Dey

AAAI 2008 pp. 1433-1438

doi:10.1184/r1/6555512 /aaai/2008/ziebart2008aaai-maximum/

Abstract

Recent research has shown the beneﬁt of framing problems of imitation learning as solutions to Markov Decision Problems. This approach reduces learning to the problem of re- covering a utility function that makes the behavior induced by a near-optimal policy closely mimic demonstrated behavior. In this work, we develop a probabilistic approach based on the principle of maximum entropy. Our approach provides a well-deﬁned, globally normalized distribution over decision sequences, while providing the same performance guarantees as existing methods. We develop our technique in the context of modeling real world navigation and driving behaviors where collected data is inherently noisy and imperfect. Our probabilistic approach enables modeling of route preferences as well as a powerful new approach to inferring destinations and routes based on partial trajectories.

PDF AAAI Semantic Scholar

Cite

Text

Ziebart et al. "Maximum Entropy Inverse Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2008. doi:10.1184/r1/6555512

Markdown

[Ziebart et al. "Maximum Entropy Inverse Reinforcement Learning." AAAI Conference on Artificial Intelligence, 2008.](https://mlanthology.org/aaai/2008/ziebart2008aaai-maximum/) doi:10.1184/r1/6555512

BibTeX

@inproceedings{ziebart2008aaai-maximum,
  title     = {{Maximum Entropy Inverse Reinforcement Learning}},
  author    = {Ziebart, Brian D. and Maas, Andrew L. and Bagnell, J. Andrew and Dey, Anind K.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2008},
  pages     = {1433-1438},
  doi       = {10.1184/r1/6555512},
  url       = {https://mlanthology.org/aaai/2008/ziebart2008aaai-maximum/}
}