Towards Theoretical Understanding of Sequential Decision Making with Preference Feedback
Abstract
The success of sequential decision-making approaches, such as reinforcement learning (RL), is closely tied to the availability of a reward feedback. However, designing a reward function that encodes the desired objective is a challenging task. In this work, we address a more realistic scenario: sequential decision making with preference feedback provided, for instance, by a human expert. We aim to build a theoretical basis linking preferences, (non-Markovian) utilities, and (Markovian) rewards, and we study the connections between them. First, we model preference feedback using a partial (pre)order over trajectories, enabling the presence of incomparabilities that are common when preferences are provided by humans but are surprisingly overlooked in existing works. Second, to provide a theoretical justification for a common practice, we investigate how a preference relation can be approximated by a multi-objective utility. We introduce a notion of preference-utility compatibility and analyze the computational complexity of this transformation, showing that constructing the minimum-dimensional utility is NP-hard. Third, we propose a novel concept of preference-based policy dominance that does not rely on utilities or rewards and discuss the computational complexity of assessing it. Fourth, we develop a computationally efficient algorithm to approximate a utility using (Markovian) rewards and quantify the error in terms of the suboptimality of the optimal policy induced by the approximating reward. This work aims to lay the foundation for a principled approach to sequential decision making from preference feedback, with promising potential applications in RL from human feedback.
Cite
Text
Drago et al. "Towards Theoretical Understanding of Sequential Decision Making with Preference Feedback." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Drago et al. "Towards Theoretical Understanding of Sequential Decision Making with Preference Feedback." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/drago2025icml-theoretical/)BibTeX
@inproceedings{drago2025icml-theoretical,
title = {{Towards Theoretical Understanding of Sequential Decision Making with Preference Feedback}},
author = {Drago, Simone and Mussi, Marco and Metelli, Alberto Maria},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {14499-14514},
volume = {267},
url = {https://mlanthology.org/icml/2025/drago2025icml-theoretical/}
}