Listwise Reward Estimation for Offline Preference-Based Reinforcement Learning

Abstract

In Reinforcement Learning (RL), designing precise reward functions remains to be a challenge, particularly when aligning with human intent. Preference-based RL (PbRL) was introduced to address this problem by learning reward models from human feedback. However, existing PbRL methods have limitations as they often overlook the second-order preference that indicates the relative strength of preference. In this paper, we propose Listwise Reward Estimation (LiRE), a novel approach for offline PbRL that leverages second-order preference information by constructing a Ranked List of Trajectories (RLT), which can be efficiently built by using the same ternary feedback type as traditional methods. To validate the effectiveness of LiRE, we propose a new offline PbRL dataset that objectively reflects the effect of the estimated rewards. Our extensive experiments on the dataset demonstrate the superiority of LiRE, i.e., outperforming state-of-the-art baselines even with modest feedback budgets and enjoying robustness with respect to the number of feedbacks and feedback noise. Our code is available at https://github.com/chwoong/LiRE

Cite

Text

Choi et al. "Listwise Reward Estimation for Offline Preference-Based Reinforcement Learning." International Conference on Machine Learning, 2024.

Markdown

[Choi et al. "Listwise Reward Estimation for Offline Preference-Based Reinforcement Learning." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/choi2024icml-listwise/)

BibTeX

@inproceedings{choi2024icml-listwise,
  title     = {{Listwise Reward Estimation for Offline Preference-Based Reinforcement Learning}},
  author    = {Choi, Heewoong and Jung, Sangwon and Ahn, Hongjoon and Moon, Taesup},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {8651-8671},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/choi2024icml-listwise/}
}