PIPER: Primitive-Informed Preference-Based Hierarchical Reinforcement Learning via Hindsight Relabeling

Singh, Utsav; Suttle, Wesley A.; Sadler, Brian M.; Namboodiri, Vinay P.; Bedi, Amrit

PIPER: Primitive-Informed Preference-Based Hierarchical Reinforcement Learning via Hindsight Relabeling

Utsav Singh, Wesley A. Suttle, Brian M. Sadler, Vinay P. Namboodiri, Amrit Bedi

ICMLW 2024

/icmlw/2024/singh2024icmlw-piper/

Abstract

In this work, we introduce PIPER: Primitive-Informed Preference-based Hierarchical reinforcement learning via Hindsight Relabeling, a novel approach that leverages preference-based learning to learn a reward model, and subsequently uses this reward model to relabel higher-level replay buffers. Since this reward is unaffected by lower primitive behavior, our relabeling-based approach is able to mitigate non-stationarity, which is common in existing hierarchical approaches, and demonstrates impressive performance across a range of challenging sparse-reward tasks. Since obtaining human feedback is typically impractical, we propose to replace the human-in-the-loop approach with our primitive-in-the-loop approach, which generates feedback using sparse rewards provided by the environment. Moreover, in order to prevent infeasible subgoal prediction and avoid degenerate solutions, we propose primitive-informed regularization that conditions higher-level policies to generate feasible subgoals. We perform extensive experiments to show that PIPER mitigates non-stationarity in hierarchical reinforcement learning and achieves greater than 50$\\%$ success rates in challenging, sparse-reward robotic environments, where most other baselines fail to achieve any significant progress.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Singh et al. "PIPER: Primitive-Informed Preference-Based Hierarchical Reinforcement Learning via Hindsight Relabeling." ICML 2024 Workshops: ARLET, 2024.

Markdown

[Singh et al. "PIPER: Primitive-Informed Preference-Based Hierarchical Reinforcement Learning via Hindsight Relabeling." ICML 2024 Workshops: ARLET, 2024.](https://mlanthology.org/icmlw/2024/singh2024icmlw-piper/)

BibTeX

@inproceedings{singh2024icmlw-piper,
  title     = {{PIPER: Primitive-Informed Preference-Based Hierarchical Reinforcement Learning via Hindsight Relabeling}},
  author    = {Singh, Utsav and Suttle, Wesley A. and Sadler, Brian M. and Namboodiri, Vinay P. and Bedi, Amrit},
  booktitle = {ICML 2024 Workshops: ARLET},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/singh2024icmlw-piper/}
}