X-MEN: Guaranteed XOR-Maximum Entropy Constrained Inverse Reinforcement Learning
Abstract
Inverse Reinforcement Learning (IRL) is a powerful way of learning from demonstrations. In this paper, we address IRL problems with the availability of prior knowledge that optimal policies will never violate certain constraints. Conventional approaches ignoring these constraints need many demonstrations to converge. We propose XOR-Maximum Entropy Constrained Inverse Reinforcement Learning (X-MEN), which is guaranteed to converge to the global optimal reward function in linear rate w.r.t. the number of learning iterations. X-MEN embeds XOR-sampling – a provable sampling approach which transforms the #-P complete sampling problem into queries to NP oracles – into the framework of maximum entropy IRL. X-MEN also guarantees the learned IRL agent will never generate trajectories that violate constraints. Empirical results in navigation demonstrate that X-MEN converges faster to the optimal rewards compared to baseline approaches and always generates trajectories that satisfy multi-state combinatorial constraints.
Cite
Text
Ding and Xue. "X-MEN: Guaranteed XOR-Maximum Entropy Constrained Inverse Reinforcement Learning." Uncertainty in Artificial Intelligence, 2022.Markdown
[Ding and Xue. "X-MEN: Guaranteed XOR-Maximum Entropy Constrained Inverse Reinforcement Learning." Uncertainty in Artificial Intelligence, 2022.](https://mlanthology.org/uai/2022/ding2022uai-xmen/)BibTeX
@inproceedings{ding2022uai-xmen,
title = {{X-MEN: Guaranteed XOR-Maximum Entropy Constrained Inverse Reinforcement Learning}},
author = {Ding, Fan and Xue, Yexiang},
booktitle = {Uncertainty in Artificial Intelligence},
year = {2022},
pages = {589-598},
volume = {180},
url = {https://mlanthology.org/uai/2022/ding2022uai-xmen/}
}