Offline Hierarchical Reinforcement Learning via Inverse Optimization

Schmidt, Carolin; Gammelli, Daniele; Harrison, James; Pavone, Marco; Rodrigues, Filipe

Offline Hierarchical Reinforcement Learning via Inverse Optimization

Carolin Schmidt, Daniele Gammelli, James Harrison, Marco Pavone, Filipe Rodrigues

ICLR 2025

/iclr/2025/schmidt2025iclr-offline/

Abstract

Hierarchical policies enable strong performance in many sequential decision-making problems, such as those with high-dimensional action spaces, those requiring long-horizon planning, and settings with sparse rewards. However, learning hierarchical policies from static offline datasets presents a significant challenge. Crucially, actions taken by higher-level policies may not be directly observable within hierarchical controllers, and the offline dataset might have been generated using a different policy structure, hindering the use of standard offline learning algorithms. In this work, we propose $\textit{OHIO}$: a framework for offline reinforcement learning (RL) of hierarchical policies. Our framework leverages knowledge of the policy structure to solve the $\textit{inverse problem}$, recovering the unobservable high-level actions that likely generated the observed data under our hierarchical policy. This approach constructs a dataset suitable for off-the-shelf offline training. We demonstrate our framework on robotic and network optimization problems and show that it substantially outperforms end-to-end RL methods and improves robustness. We investigate a variety of instantiations of our framework, both in direct deployment of policies trained offline and when online fine-tuning is performed. Code and data are available at https://ohio-offline-hierarchical-rl.github.io.

PDF ICLR Semantic Scholar

Cite

Text

Schmidt et al. "Offline Hierarchical Reinforcement Learning via Inverse Optimization." International Conference on Learning Representations, 2025.

Markdown

[Schmidt et al. "Offline Hierarchical Reinforcement Learning via Inverse Optimization." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/schmidt2025iclr-offline/)

BibTeX

@inproceedings{schmidt2025iclr-offline,
  title     = {{Offline Hierarchical Reinforcement Learning via Inverse Optimization}},
  author    = {Schmidt, Carolin and Gammelli, Daniele and Harrison, James and Pavone, Marco and Rodrigues, Filipe},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/schmidt2025iclr-offline/}
}