Bilevel Reinforcement Learning via the Development of Hyper-Gradient Without Lower-Level Convexity

AISTATS 2025 pp. 4780-4788

/aistats/2025/yang2025aistats-bilevel/

Abstract

Bilevel reinforcement learning (RL), which features intertwined two-level problems, has attracted growing interest recently. The inherent non-convexity of the lower-level RL problem is, however, to be an impediment to developing bilevel optimization methods. By employing the fixed point equation associated with the regularized RL, we characterize the hyper-gradient via fully first-order information, thus circumventing the assumption of lower-level convexity. This, remarkably, distinguishes our development of hyper-gradient from the general AID-based bilevel frameworks since we take advantage of the specific structure of RL problems. Moreover, we design both model-based and model-free bilevel reinforcement learning algorithms, facilitated by access to the fully first-order hyper-gradient. Both algorithms enjoy the convergence rate $\mathcal{O}\left(\epsilon^{-1}\right)$. To extend the applicability, a stochastic version of the model-free algorithm is proposed, along with results on its convergence rate and sampling complexity. In addition, numerical experiments demonstrate that the hyper-gradient indeed serves as an integration of exploitation and exploration.

PDF AISTATS OpenReview Semantic Scholar

Cite

Text

Yang et al. "Bilevel Reinforcement Learning via the Development of Hyper-Gradient Without Lower-Level Convexity." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.

Markdown

[Yang et al. "Bilevel Reinforcement Learning via the Development of Hyper-Gradient Without Lower-Level Convexity." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/yang2025aistats-bilevel/)

BibTeX

@inproceedings{yang2025aistats-bilevel,
  title     = {{Bilevel Reinforcement Learning via the Development of Hyper-Gradient Without Lower-Level Convexity}},
  author    = {Yang, Yan and Gao, Bin and Yuan, Ya-xiang},
  booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  year      = {2025},
  pages     = {4780-4788},
  volume    = {258},
  url       = {https://mlanthology.org/aistats/2025/yang2025aistats-bilevel/}
}