BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces
Abstract
Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects. We propose \textbf{Bra}nch \textbf{V}alue \textbf{E}stimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to $20\times$ in environments with over four million actions.
Cite
Text
Landers et al. "BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces." Advances in Neural Information Processing Systems, 2025.Markdown
[Landers et al. "BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/landers2025neurips-brave/)BibTeX
@inproceedings{landers2025neurips-brave,
title = {{BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces}},
author = {Landers, Matthew and Killian, Taylor W. and Barnes, Hugo and Hartvigsen, Thomas and Doryab, Afsaneh},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/landers2025neurips-brave/}
}