Towards a General Transfer Approach for Policy-Value Networks

Abstract

Transferring trained policies and value functions from one task to another, such as one game to another with a different board size, board shape, or more substantial rule changes, is a challenging problem. Popular benchmarks for reinforcement learning (RL), such as Atari games and ProcGen, have limited variety especially in terms of action spaces. Due to a focus on such benchmarks, the development of transfer methods that can also handle changes in action spaces has received relatively little attention. Furthermore, we argue that progress towards more general methods should include benchmarks where new problem instances can be described by domain experts, rather than machine learning experts, using convenient, high-level domain specific languages (DSLs). In addition to enabling end users to more easily describe their problems, user-friendly DSLs also contain relevant task information which can be leveraged to make effective zero-shot transfer plausibly achievable. As an example, we use the Ludii general game system, which includes a highly varied set of over 1000 distinct games described in such a language. We propose a simple baseline approach for transferring fully convolutional policy-value networks, which are used to guide search agents similar to AlphaZero, between any pair of games modelled in this system. Extensive results---including various cases of highly successful zero-shot transfer---are provided for a wide variety of source and target games.

Cite

Text

Soemers et al. "Towards a General Transfer Approach for Policy-Value Networks." Transactions on Machine Learning Research, 2023.

Markdown

[Soemers et al. "Towards a General Transfer Approach for Policy-Value Networks." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/soemers2023tmlr-general/)

BibTeX

@article{soemers2023tmlr-general,
  title     = {{Towards a General Transfer Approach for Policy-Value Networks}},
  author    = {Soemers, Dennis J. N. J. and Mella, Vegard and Piette, Eric and Stephenson, Matthew and Browne, Cameron and Teytaud, Olivier},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/soemers2023tmlr-general/}
}