Posterior Sampling for Reinforcement Learning on Graphs

Arnaud Robert, Aldo A. Faisal, Ciara Pike-Burke

TMLR 2025

/tmlr/2025/robert2025tmlr-posterior/

Abstract

Many Markov Decision Processes (MDPs) exhibit structure in their state and action spaces that is not exploited. We consider the case where the structure can be modelled using a directed acyclic graph (DAG) composed of nodes and edges. In this case, each node has a state, and the state transition dynamics are influenced by the states and actions at its parent nodes. We propose an MDP framework, \emph{Directed Acyclic Markov Decision Process} (DAMDP) that formalises this problem, and we develop algorithms to perform planning and learning. Crucially, DAMDPs retain many of the benefits of MDPs, as we can show that Dynamic Programming can find the optimal policy in known DAMDPs. We also demonstrate how to perform Reinforcement Learning in DAMDPs when the transition probabilities and the reward function are unknown. To this end, we derive a posterior sampling-based algorithm that is able to leverage the graph structure to boost learning efficiency. Moreover, we obtain a theoretical bound on the Bayesian regret for this algorithm, which directly shows the efficiency gain from considering the graph structure. We then conclude by empirically demonstrating that by harnessing the DAMDP, our algorithm outperforms traditional posterior sampling for Reinforcement Learning in both a maximum flow problem and a real-world wind farm optimisation task.

PDF TMLR Code Semantic Scholar

Cite

Text

Robert et al. "Posterior Sampling for Reinforcement Learning on Graphs." Transactions on Machine Learning Research, 2025.

Markdown

[Robert et al. "Posterior Sampling for Reinforcement Learning on Graphs." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/robert2025tmlr-posterior/)

BibTeX

@article{robert2025tmlr-posterior,
  title     = {{Posterior Sampling for Reinforcement Learning on Graphs}},
  author    = {Robert, Arnaud and Faisal, Aldo A. and Pike-Burke, Ciara},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/robert2025tmlr-posterior/}
}