GFlowNet Training by Policy Gradients

Abstract

Generative Flow Networks (GFlowNets) have been shown effective to generate combinatorial objects with desired properties. We here propose a new GFlowNet training framework, with policy-dependent rewards, that bridges keeping flow balance of GFlowNets to optimizing the expected accumulated reward in traditional Reinforcement-Learning (RL). This enables the derivation of new policy-based GFlowNet training methods, in contrast to existing ones resembling value-based RL. It is known that the design of backward policies in GFlowNet training affects efficiency. We further develop a coupled training strategy that jointly solves GFlowNet forward policy training and backward policy design. Performance analysis is provided with a theoretical guarantee of our policy-based GFlowNet training. Experiments on both simulated and real-world datasets verify that our policy-based strategies provide advanced RL perspectives for robust gradient estimation to improve GFlowNet performance. Our code is available at: github.com/niupuhua1234/GFN-PG.

Cite

Text

Niu et al. "GFlowNet Training by Policy Gradients." International Conference on Machine Learning, 2024.

Markdown

[Niu et al. "GFlowNet Training by Policy Gradients." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/niu2024icml-gflownet/)

BibTeX

@inproceedings{niu2024icml-gflownet,
  title     = {{GFlowNet Training by Policy Gradients}},
  author    = {Niu, Puhua and Wu, Shili and Fan, Mingzhou and Qian, Xiaoning},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {38344-38380},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/niu2024icml-gflownet/}
}