QGFN: Controllable Greediness with Action Values

Abstract

Generative Flow Networks (GFlowNets; GFNs) are a family of energy-based generative methods for combinatorial objects, capable of generating diverse and high-utility samples. However, consistently biasing GFNs towards producing high-utility samples is non-trivial. In this work, we leverage connections between GFNs and reinforcement learning (RL) and propose to combine the GFN policy with an action-value estimate, $Q$, to create greedier sampling policies which can be controlled by a mixing parameter. We show that several variants of the proposed method, QGFN, are able to improve on the number of high-reward samples generated in a variety of tasks without sacrificing diversity.

Cite

Text

Lau et al. "QGFN: Controllable Greediness with Action Values." Neural Information Processing Systems, 2024. doi:10.52202/079017-2594

Markdown

[Lau et al. "QGFN: Controllable Greediness with Action Values." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/lau2024neurips-qgfn/) doi:10.52202/079017-2594

BibTeX

@inproceedings{lau2024neurips-qgfn,
  title     = {{QGFN: Controllable Greediness with Action Values}},
  author    = {Lau, Elaine and Lu, Stephen Zhewen and Pan, Ling and Precup, Doina and Bengio, Emmanuel},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2594},
  url       = {https://mlanthology.org/neurips/2024/lau2024neurips-qgfn/}
}