Learning Utilities from Demonstrations in Markov Decision Processes

Abstract

Although it is well-known that humans commonly engage in risk-sensitive behaviors in the presence of stochasticity, most Inverse Reinforcement Learning (IRL) models assume a risk-neutral agent. As such, beyond $(i)$ introducing model misspecification, $(ii)$ they do not permit direct inference of the risk attitude of the observed agent, which can be useful in many applications. In this paper, we propose a novel model of behavior to cope with these issues. By allowing for risk sensitivity, our model alleviates $(i)$, and by explicitly representing risk attitudes through (learnable) utility functions, it solves $(ii)$. Then, we characterize the partial identifiability of an agent’s utility under the new model and note that demonstrations from multiple environments mitigate the problem. We devise two provably-efficient algorithms for learning utilities in a finite-data regime, and we conclude with some proof-of-concept experiments to validate both our model and our algorithms.

Cite

Text

Lazzati and Metelli. "Learning Utilities from Demonstrations in Markov Decision Processes." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Lazzati and Metelli. "Learning Utilities from Demonstrations in Markov Decision Processes." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/lazzati2025icml-learning/)

BibTeX

@inproceedings{lazzati2025icml-learning,
  title     = {{Learning Utilities from Demonstrations in Markov Decision Processes}},
  author    = {Lazzati, Filippo and Metelli, Alberto Maria},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {32704-32770},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/lazzati2025icml-learning/}
}