Protein Discovery with Discrete Walk-Jump Sampling

Abstract

We resolve difficulties in training and sampling from a discrete generative model by learning a smoothed energy function, sampling from the smoothed data manifold with Langevin Markov chain Monte Carlo (MCMC), and projecting back to the true data manifold with one-step denoising. Our $\textit{Discrete Walk-Jump Sampling}$ formalism combines the contrastive divergence training of an energy-based model and improved sample quality of a score-based model, while simplifying training and sampling by requiring only a single noise level. We evaluate the robustness of our approach on generative modeling of antibody proteins and introduce the $\textit{distributional conformity score}$ to benchmark protein generative models. By optimizing and sampling from our models for the proposed distributional conformity score, 97-100\% of generated samples are successfully expressed and purified and 70\% of functional designs show equal or improved binding affinity compared to known functional antibodies on the first attempt in a single round of laboratory experiments. We also report the first demonstration of long-run fast-mixing MCMC chains where diverse antibody protein classes are visited in a single MCMC chain.

Cite

Text

Frey et al. "Protein Discovery with Discrete Walk-Jump Sampling." International Conference on Learning Representations, 2024.

Markdown

[Frey et al. "Protein Discovery with Discrete Walk-Jump Sampling." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/frey2024iclr-protein/)

BibTeX

@inproceedings{frey2024iclr-protein,
  title     = {{Protein Discovery with Discrete Walk-Jump Sampling}},
  author    = {Frey, Nathan C. and Berenberg, Dan and Zadorozhny, Karina and Kleinhenz, Joseph and Lafrance-Vanasse, Julien and Hotzel, Isidro and Wu, Yan and Ra, Stephen and Bonneau, Richard and Cho, Kyunghyun and Loukas, Andreas and Gligorijevic, Vladimir and Saremi, Saeed},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/frey2024iclr-protein/}
}