Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization

Abstract

Applying reinforcement learning (RL) to combinatorial optimization problems is attractive as it removes the need for expert knowledge or pre-solved instances. However, it is unrealistic to expect an agent to solve these (often NP-)hard problems in a single shot at inference due to their inherent complexity. Thus, leading approaches often implement additional search strategies, from stochastic sampling and beam-search to explicit fine-tuning. In this paper, we argue for the benefits of learning a population of complementary policies, which can be simultaneously rolled out at inference. To this end, we introduce Poppy, a simple training procedure for populations. Instead of relying on a predefined or hand-crafted notion of diversity, Poppy induces an unsupervised specialization targeted solely at maximizing the performance of the population. We show that Poppy produces a set of complementary policies, and obtains state-of-the-art RL results on three popular NP-hard problems: traveling salesman, capacitated vehicle routing, and job-shop scheduling.

Cite

Text

Grinsztajn et al. "Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization." Neural Information Processing Systems, 2023.

Markdown

[Grinsztajn et al. "Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/grinsztajn2023neurips-winner/)

BibTeX

@inproceedings{grinsztajn2023neurips-winner,
  title     = {{Winner Takes It All: Training Performant RL Populations for Combinatorial Optimization}},
  author    = {Grinsztajn, Nathan and Furelos-Blanco, Daniel and Surana, Shikha and Bonnet, Clément and Barrett, Tom},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/grinsztajn2023neurips-winner/}
}