Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing

Abstract

Value-function-based methods have long played an important role in reinforcement learning. However, finding the best next action given a value function of arbitrary complexity is nontrivial when the action space is too large for enumeration. We develop a framework for value-function-based deep reinforcement learning with a combinatorial action space, in which the action selection problem is explicitly formulated as a mixed-integer optimization problem. As a motivating example, we present an application of this framework to the capacitated vehicle routing problem (CVRP), a combinatorial optimization problem in which a set of locations must be covered by a single vehicle with limited capacity. On each instance, we model an action as the construction of a single route, and consider a deterministic policy which is improved through a simple policy iteration algorithm. Our approach is competitive with other reinforcement learning methods and achieves an average gap of 1.7% with state-of-the-art OR methods on standard library instances of medium size.

PDF NeurIPS Semantic Scholar

Cite

Text

Delarue et al. "Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing." Neural Information Processing Systems, 2020.

Markdown

[Delarue et al. "Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/delarue2020neurips-reinforcement/)

BibTeX

@inproceedings{delarue2020neurips-reinforcement,
  title     = {{Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing}},
  author    = {Delarue, Arthur and Anderson, Ross and Tjandraatmadja, Christian},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/delarue2020neurips-reinforcement/}
}