Pessimistic Backward Policy for GFlowNets
Abstract
This paper studies Generative Flow Networks (GFlowNets), which learn to sample objects proportionally to a given reward function through the trajectory of state transitions. In this work, we observe that GFlowNets tend to under-exploit the high-reward objects due to training on insufficient number of trajectories, which may lead to a large gap between the estimated flow and the (known) reward value. In response to this challenge, we propose a pessimistic backward policy for GFlowNets (PBP-GFN), which maximizes the observed flow to align closely with the true reward for the object. We extensively evaluate PBP-GFN across eight benchmarks, including hyper-grid environment, bag generation, structured set generation, molecular generation, and four RNA sequence generation tasks. In particular, PBP-GFN enhances the discovery of high-reward objects, maintains the diversity of the objects, and consistently outperforms existing methods.
Cite
Text
Jang et al. "Pessimistic Backward Policy for GFlowNets." Neural Information Processing Systems, 2024. doi:10.52202/079017-3400Markdown
[Jang et al. "Pessimistic Backward Policy for GFlowNets." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/jang2024neurips-pessimistic/) doi:10.52202/079017-3400BibTeX
@inproceedings{jang2024neurips-pessimistic,
title = {{Pessimistic Backward Policy for GFlowNets}},
author = {Jang, Hyosoon and Jang, Yunhui and Kim, Minsu and Park, Jinkyoo and Ahn, Sungsoo},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-3400},
url = {https://mlanthology.org/neurips/2024/jang2024neurips-pessimistic/}
}