Meta-Learning Population-Based Methods for Reinforcement Learning
Abstract
Reinforcement learning (RL) algorithms are highly sensitive to their hyperparameter settings. Recently, numerous methods have been proposed to dynamically optimize these hyperparameters. One prominent approach is Population-Based Bandits (PB2), which uses time-varying Gaussian processes (GP) to dynamically optimize hyperparameters with a population of parallel agents. Despite its strong overall performance, PB2 experiences slow starts due to the GP initially lacking sufficient information. To mitigate this issue, we propose four different methods that utilize meta-data from various environments. These approaches are novel in that they adapt meta-learning methods to accommodate the time-varying setting. Among these approaches, MultiTaskPB2, which uses meta-learning for the surrogate model, stands out as the most promising approach. It outperforms PB2 and other baselines in both anytime and final performance across two RL environment families.
Cite
Text
Hog et al. "Meta-Learning Population-Based Methods for Reinforcement Learning." Transactions on Machine Learning Research, 2025.Markdown
[Hog et al. "Meta-Learning Population-Based Methods for Reinforcement Learning." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/hog2025tmlr-metalearning/)BibTeX
@article{hog2025tmlr-metalearning,
title = {{Meta-Learning Population-Based Methods for Reinforcement Learning}},
author = {Hog, Johannes and Rajan, Raghu and Biedenkapp, André and Awad, Noor and Hutter, Frank and Nguyen, Vu},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/hog2025tmlr-metalearning/}
}