Meta-Black-Box-Optimization Through Offline Q-Function Learning

Abstract

Recent progress in Meta-Black-Box-Optimization (MetaBBO) has demonstrated that using RL to learn a meta-level policy for dynamic algorithm configuration (DAC) over an optimization task distribution could significantly enhance the performance of the low-level BBO algorithm. However, the online learning paradigms in existing works makes the efficiency of MetaBBO problematic. To address this, we propose an offline learning-based MetaBBO framework in this paper, termed Q-Mamba, to attain both effectiveness and efficiency in MetaBBO. Specifically, we first transform DAC task into long-sequence decision process. This allows us further introduce an effective Q-function decomposition mechanism to reduce the learning difficulty within the intricate algorithm configuration space. Under this setting, we propose three novel designs to meta-learn DAC policy from offline data: we first propose a novel collection strategy for constructing offline DAC experiences dataset with balanced exploration and exploitation. We then establish a decomposition-based Q-loss that incorporates conservative Q-learning to promote stable offline learning from the offline dataset. To further improve the offline learning efficiency, we equip our work with a Mamba architecture which helps long-sequence learning effectiveness and efficiency by selective state model and hardware-aware parallel scan respectively. Through extensive benchmarking, we observe that Q-Mamba achieves competitive or even superior performance to prior online/offline baselines, while significantly improving the training efficiency of existing online baselines. We provide sourcecodes of Q-Mamba https://github.com/MetaEvo/Q-Mamba.

Cite

Text

Ma et al. "Meta-Black-Box-Optimization Through Offline Q-Function Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Ma et al. "Meta-Black-Box-Optimization Through Offline Q-Function Learning." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/ma2025icml-metablackboxoptimization/)

BibTeX

@inproceedings{ma2025icml-metablackboxoptimization,
  title     = {{Meta-Black-Box-Optimization Through Offline Q-Function Learning}},
  author    = {Ma, Zeyuan and Cao, Zhiguang and Jiang, Zhou and Guo, Hongshu and Gong, Yue-Jiao},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {41807-41826},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/ma2025icml-metablackboxoptimization/}
}