Online Inverse Reinforcement Learning with Learned Observation Model

Saurabh Arora, Prashant Doshi, Bikramjit Banerjee

CoRL 2022 pp. 1468-1477

/corl/2022/arora2022corl-online/

Abstract

With the motivation of extending incremental inverse reinforcement learning (I2RL) to real-world robotics applications with noisy observations as well as an unknown observation model, we introduce a new method (RIMEO) that approximates the observation model in order to best estimate the noise-free ground truth underlying the observations. It learns a maximum entropy distribution over the observation features governing the perception process, and then uses the inferred observation model to learn the reward function. Experimental evaluation is performed in two robotics tasks: (1) post-harvest vegetable sorting with a Sawyer arm based on human demonstration, and (2) breaching a perimeter patrol by two Turtlebots. Our experiments reveal that RIMEO learns a more accurate policy compared to (a) a state-of-the-art IRL method that does not directly learn an observation model, and (b) a custom baseline that learns a less sophisticated observation model. Furthermore, we show that RIMEO admits formal guarantees of monotonic convergence and a sample complexity bound.

PDF CoRL OpenReview Semantic Scholar

Cite

Text

Arora et al. "Online Inverse Reinforcement Learning with Learned Observation Model." Conference on Robot Learning, 2022.

Markdown

[Arora et al. "Online Inverse Reinforcement Learning with Learned Observation Model." Conference on Robot Learning, 2022.](https://mlanthology.org/corl/2022/arora2022corl-online/)

BibTeX

@inproceedings{arora2022corl-online,
  title     = {{Online Inverse Reinforcement Learning with Learned Observation Model}},
  author    = {Arora, Saurabh and Doshi, Prashant and Banerjee, Bikramjit},
  booktitle = {Conference on Robot Learning},
  year      = {2022},
  pages     = {1468-1477},
  volume    = {205},
  url       = {https://mlanthology.org/corl/2022/arora2022corl-online/}
}