Gumbel-SoftMax Score and Flow Matching for Discrete Biological Sequence Generation
Abstract
We introduce Gumbel-Softmax Score and Flow Matching, a generative framework that relies on a novel Gumbel-Softmax interpolation between smooth categorical distributions to one concentrated at a single vertex by defining a time-dependent temperature parameter. Using this interpolant, we explore Gumbel-Softmax Flow Matching by deriving a parameterized velocity field transports smooth categorical distributions to the vertices of the simplex. We alternatively present Gumbel-Softmax Score Matching which learns to regress the gradient of the probability density. Our approach enables controllable generation with tunable temperatures and stochastic Gumbel noise during inference, enabling efficient de novo sequence design. Our experiments demonstrate state-of-the-art performance in conditional DNA promoter design and strong results in de novo sequence-only protein generation.
Cite
Text
Tang et al. "Gumbel-SoftMax Score and Flow Matching for Discrete Biological Sequence Generation." ICLR 2025 Workshops: AI4NA, 2025.Markdown
[Tang et al. "Gumbel-SoftMax Score and Flow Matching for Discrete Biological Sequence Generation." ICLR 2025 Workshops: AI4NA, 2025.](https://mlanthology.org/iclrw/2025/tang2025iclrw-gumbelsoftmax/)BibTeX
@inproceedings{tang2025iclrw-gumbelsoftmax,
title = {{Gumbel-SoftMax Score and Flow Matching for Discrete Biological Sequence Generation}},
author = {Tang, Sophia and Zhang, Yinuo and Tong, Alexander and Chatterjee, Pranam},
booktitle = {ICLR 2025 Workshops: AI4NA},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/tang2025iclrw-gumbelsoftmax/}
}