Bilevel Reinforcement Learning via the Development of Hyper-Gradient Without Lower-Level Convexity
Abstract
Bilevel reinforcement learning (RL), which features intertwined two-level problems, has attracted growing interest recently. The inherent non-convexity of the lower-level RL problem is, however, to be an impediment to developing bilevel optimization methods. By employing the fixed point equation associated with the regularized RL, we characterize the hyper-gradient via fully first-order information, thus circumventing the assumption of lower-level convexity. This, remarkably, distinguishes our development of hyper-gradient from the general AID-based bilevel frameworks since we take advantage of the specific structure of RL problems. Moreover, we design both model-based and model-free bilevel reinforcement learning algorithms, facilitated by access to the fully first-order hyper-gradient. Both algorithms enjoy the convergence rate $\mathcal{O}\left(\epsilon^{-1}\right)$. To extend the applicability, a stochastic version of the model-free algorithm is proposed, along with results on its convergence rate and sampling complexity. In addition, numerical experiments demonstrate that the hyper-gradient indeed serves as an integration of exploitation and exploration.
Cite
Text
Yang et al. "Bilevel Reinforcement Learning via the Development of Hyper-Gradient Without Lower-Level Convexity." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.Markdown
[Yang et al. "Bilevel Reinforcement Learning via the Development of Hyper-Gradient Without Lower-Level Convexity." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/yang2025aistats-bilevel/)BibTeX
@inproceedings{yang2025aistats-bilevel,
title = {{Bilevel Reinforcement Learning via the Development of Hyper-Gradient Without Lower-Level Convexity}},
author = {Yang, Yan and Gao, Bin and Yuan, Ya-xiang},
booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
year = {2025},
pages = {4780-4788},
volume = {258},
url = {https://mlanthology.org/aistats/2025/yang2025aistats-bilevel/}
}