Highway Graph to Accelerate Reinforcement Learning
Abstract
Reinforcement Learning (RL) algorithms often struggle with low training efficiency. A common approach to address this challenge is integrating model-based planning algorithms, such as Monte Carlo Tree Search (MCTS) or Value Iteration (VI), into the environmental model. However, VI faces a significant limitation: it requires iterating over a large tensor with dimensions $|\mathcal{S}|\times |\mathcal{A}| \times |\mathcal{S}|$, where $\mathcal{S}$ and $\mathcal{A}$ represent the state and action spaces, respectively. This process updates the value of the preceding state $s_{t-1}$ based on the succeeding state $s_t$ through value propagation, resulting in computationally intensive operations. To enhance the training efficiency of RL algorithms, we propose improving the efficiency of the value learning process. In deterministic environments with discrete state and action spaces, we observe that on the sampled empirical state-transition graph, a non-branching sequence of transitions—termed a \textit{highway}—can take the agent directly from $s_0$ to $s_T$ without deviation through intermediate states. On these non-branching highways, the value-updating process can be streamlined into a single-step operation, eliminating the need for iterative, step-by-step updates. Building on this observation, we introduce a novel graph structure called the \textit{highway graph} to model state transitions. The highway graph compresses the transition model into a compact representation, where edges can encapsulate multiple state transitions, enabling value propagation across multiple time steps in a single iteration. By integrating the highway graph into RL (as a model-based off-policy RL method), the training process is significantly accelerated, particularly in the early stages of training. Experiments across four categories of environments demonstrate that our method learns significantly faster than established and state-of-the-art model-free and model-based RL algorithms (often by a factor of 10 to 150) while maintaining equal or superior expected returns. Furthermore, a deep neural network-based agent trained using the highway graph exhibits improved generalization capabilities and reduced storage costs.
Cite
Text
Yin et al. "Highway Graph to Accelerate Reinforcement Learning." Transactions on Machine Learning Research, 2025.Markdown
[Yin et al. "Highway Graph to Accelerate Reinforcement Learning." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/yin2025tmlr-highway/)BibTeX
@article{yin2025tmlr-highway,
title = {{Highway Graph to Accelerate Reinforcement Learning}},
author = {Yin, Zidu and Zhang, Zhen and Gong, Dong and Albrecht, Stefano V and Shi, Javen Qinfeng},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/yin2025tmlr-highway/}
}