Eliciting User Preferences for Personalized Multi-Objective Decision Making Through Comparative Feedback

Abstract

In this work, we propose a multi-objective decision making framework that accommodates different user preferences over objectives, where preferences are learned via policy comparisons. Our model consists of a known Markov decision process with a vector-valued reward function, with each user having an unknown preference vector that expresses the relative importance of each objective. The goal is to efficiently compute a near-optimal policy for a given user. We consider two user feedback models. We first address the case where a user is provided with two policies and returns their preferred policy as feedback. We then move to a different user feedback model, where a user is instead provided with two small weighted sets of representative trajectories and selects the preferred one. In both cases, we suggest an algorithm that finds a nearly optimal policy for the user using a number of comparison queries that scales quasilinearly in the number of objectives.

Cite

Text

Shao et al. "Eliciting User Preferences for Personalized Multi-Objective Decision Making Through Comparative Feedback." Neural Information Processing Systems, 2023.

Markdown

[Shao et al. "Eliciting User Preferences for Personalized Multi-Objective Decision Making Through Comparative Feedback." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/shao2023neurips-eliciting/)

BibTeX

@inproceedings{shao2023neurips-eliciting,
  title     = {{Eliciting User Preferences for Personalized Multi-Objective Decision Making Through Comparative Feedback}},
  author    = {Shao, Han and Cohen, Lee and Blum, Avrim and Mansour, Yishay and Saha, Aadirupa and Walter, Matthew},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/shao2023neurips-eliciting/}
}