Reinforcement Learning for Adaptive MCMC

Abstract

An informal observation, made by several authors, is that the adaptive design of a Markov transition kernel has the flavour of a reinforcement learning task. Yet, to-date it has remained unclear how to exploit modern reinforcement learning technologies for adaptive MCMC. The aim of this paper is to set out a general framework, called \emph{Reinforcement Learning Metropolis—Hastings}, that is theoretically supported and empirically validated. Our principal focus is on learning fast-mixing Metropolis—Hastings transition kernels, which we cast as deterministic policies and optimise via a policy gradient. Control of the learning rate provably ensures conditions for ergodicity are satisfied. The methodology is used to construct a gradient-free sampler that out-performs a popular gradient-free adaptive Metropolis–Hastings algorithm on $\approx$90% of tasks in the \emph{PosteriorDB} benchmark.

Cite

Text

Wang et al. "Reinforcement Learning for Adaptive MCMC." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.

Markdown

[Wang et al. "Reinforcement Learning for Adaptive MCMC." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/wang2025aistats-reinforcement/)

BibTeX

@inproceedings{wang2025aistats-reinforcement,
  title     = {{Reinforcement Learning for Adaptive MCMC}},
  author    = {Wang, Congye and Chen, Wilson Ye and Kanagawa, Heishiro and Oates, Chris J.},
  booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  year      = {2025},
  pages     = {640-648},
  volume    = {258},
  url       = {https://mlanthology.org/aistats/2025/wang2025aistats-reinforcement/}
}