BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces

Abstract

Offline reinforcement learning in high-dimensional, discrete action spaces is challenging due to the exponential scaling of the joint action space with the number of sub-actions and the complexity of modeling sub-action dependencies. Existing methods either exhaustively evaluate the action space, making them computationally infeasible, or factorize Q-values, failing to represent joint sub-action effects. We propose \textbf{Bra}nch \textbf{V}alue \textbf{E}stimation (BraVE), a value-based method that uses tree-structured action traversal to evaluate a linear number of joint actions while preserving dependency structure. BraVE outperforms prior offline RL methods by up to $20\times$ in environments with over four million actions.

Cite

Text

Landers et al. "BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces." Advances in Neural Information Processing Systems, 2025.

Markdown

[Landers et al. "BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/landers2025neurips-brave/)

BibTeX

@inproceedings{landers2025neurips-brave,
  title     = {{BraVE: Offline Reinforcement Learning for Discrete Combinatorial Action Spaces}},
  author    = {Landers, Matthew and Killian, Taylor W. and Barnes, Hugo and Hartvigsen, Thomas and Doryab, Afsaneh},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/landers2025neurips-brave/}
}