GFlowNet Training by Policy Gradients
Abstract
Generative Flow Networks (GFlowNets) have been shown effective to generate combinatorial objects with desired properties. We here propose a new GFlowNet training framework, with policy-dependent rewards, that bridges keeping flow balance of GFlowNets to optimizing the expected accumulated reward in traditional Reinforcement-Learning (RL). This enables the derivation of new policy-based GFlowNet training methods, in contrast to existing ones resembling value-based RL. It is known that the design of backward policies in GFlowNet training affects efficiency. We further develop a coupled training strategy that jointly solves GFlowNet forward policy training and backward policy design. Performance analysis is provided with a theoretical guarantee of our policy-based GFlowNet training. Experiments on both simulated and real-world datasets verify that our policy-based strategies provide advanced RL perspectives for robust gradient estimation to improve GFlowNet performance. Our code is available at: github.com/niupuhua1234/GFN-PG.
Cite
Text
Niu et al. "GFlowNet Training by Policy Gradients." International Conference on Machine Learning, 2024.Markdown
[Niu et al. "GFlowNet Training by Policy Gradients." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/niu2024icml-gflownet/)BibTeX
@inproceedings{niu2024icml-gflownet,
title = {{GFlowNet Training by Policy Gradients}},
author = {Niu, Puhua and Wu, Shili and Fan, Mingzhou and Qian, Xiaoning},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {38344-38380},
volume = {235},
url = {https://mlanthology.org/icml/2024/niu2024icml-gflownet/}
}