HIP-RL: Hallucinated Inputs for Preference-Based Reinforcement Learning in Continuous Domains

Abstract

Preference-based Reinforcement Learning (PbRL) enables agents to learn policies based on preferences between trajectories rather than explicit reward functions. Previous approaches to PbRL are either experimental and successfully used in real-world applications but lack theoretical understanding, or they have strong theoretical guarantees but only for tabular settings. In this work, we propose a novel practical PbRL algorithm in the continuous domain called Hallucinated Inputs Preference-based RL (HIP-RL) which filled the gap between theory and practice. HIP-RL parametrizes the set of transition models and uses hallucinated inputs to facilitate optimistic exploration in continuous state-action spaces by controlling the epistemic uncertainty. We construct regret bounds for HIP-RL and show that they are sublinear for Gaussian Process dynamic and reward models. Moreover, we experimentally demonstrate the effectiveness of HIP-RL.

Cite

Text

Zhang and Ramponi. "HIP-RL: Hallucinated Inputs for Preference-Based Reinforcement Learning in Continuous Domains." ICML 2023 Workshops: MFPL, 2023.

Markdown

[Zhang and Ramponi. "HIP-RL: Hallucinated Inputs for Preference-Based Reinforcement Learning in Continuous Domains." ICML 2023 Workshops: MFPL, 2023.](https://mlanthology.org/icmlw/2023/zhang2023icmlw-hiprl/)

BibTeX

@inproceedings{zhang2023icmlw-hiprl,
  title     = {{HIP-RL: Hallucinated Inputs for Preference-Based Reinforcement Learning in Continuous Domains}},
  author    = {Zhang, Chen Bo Calvin and Ramponi, Giorgia},
  booktitle = {ICML 2023 Workshops: MFPL},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/zhang2023icmlw-hiprl/}
}