Inverse Reinforcement Learning by Estimating Expertise of Demonstrators
Abstract
In Imitation Learning (IL), utilizing suboptimal and heterogeneous demonstrations presents a substantial challenge due to the varied nature of real-world data. However, standard IL algorithms consider these datasets as homogeneous, thereby inheriting the deficiencies of suboptimal demonstrators. Previous approaches to this issue rely on impractical assumptions like high-quality data subsets, confidence rankings, or explicit environmental knowledge. This paper introduces IRLEED, *Inverse Reinforcement Learning by Estimating Expertise of Demonstrators*, a novel framework that overcomes these hurdles without prior knowledge of demonstrator expertise. IRLEED enhances existing Inverse Reinforcement Learning (IRL) algorithms by combining a general model for demonstrator suboptimality to address reward bias and action variance, with a Maximum Entropy IRL framework to efficiently derive the optimal policy from diverse, suboptimal demonstrations. Experiments in both online and offline IL settings, with simulated and human-generated data, demonstrate IRLEED's adaptability and effectiveness, making it a versatile solution for learning from suboptimal demonstrations.
Cite
Text
Beliaev and Pedarsani. "Inverse Reinforcement Learning by Estimating Expertise of Demonstrators." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I15.33705Markdown
[Beliaev and Pedarsani. "Inverse Reinforcement Learning by Estimating Expertise of Demonstrators." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/beliaev2025aaai-inverse/) doi:10.1609/AAAI.V39I15.33705BibTeX
@inproceedings{beliaev2025aaai-inverse,
title = {{Inverse Reinforcement Learning by Estimating Expertise of Demonstrators}},
author = {Beliaev, Mark and Pedarsani, Ramtin},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {15532-15540},
doi = {10.1609/AAAI.V39I15.33705},
url = {https://mlanthology.org/aaai/2025/beliaev2025aaai-inverse/}
}