Optimizing Protein Fitness Using Gibbs Sampling with Graph-Based Smoothing
Abstract
The ability to design novel proteins with higher fitness on a given task would be revolutionary for many fields of medicine. However, brute-force search through the combinatorially large space of sequences is infeasible. Prior methods constrain search to a small mutational radius from a reference sequence, but such heuristics drastically limit the design space. Our work seeks to remove the restriction on mutational distance while enabling efficient exploration. We propose Gibbs sampling with Graph-based Smoothing (GGS) which iteratively applies Gibbs with gradients to propose advantageous mutations using graph-based smoothing to remove noisy gradients that lead to false positives. Our method is state-of-the-art in discovering high-fitness proteins with up to 8 mutations from the training set. We study the GFP and AAV design problems, ablations, and baselines to elucidate the results. Code: https://github.com/kirjner/GGS
Cite
Text
Kirjner et al. "Optimizing Protein Fitness Using Gibbs Sampling with Graph-Based Smoothing." ICML 2023 Workshops: SODS, 2023.Markdown
[Kirjner et al. "Optimizing Protein Fitness Using Gibbs Sampling with Graph-Based Smoothing." ICML 2023 Workshops: SODS, 2023.](https://mlanthology.org/icmlw/2023/kirjner2023icmlw-optimizing/)BibTeX
@inproceedings{kirjner2023icmlw-optimizing,
title = {{Optimizing Protein Fitness Using Gibbs Sampling with Graph-Based Smoothing}},
author = {Kirjner, Andrew and Yim, Jason and Samusevich, Raman and Jaakkola, Tommi S. and Barzilay, Regina and Fiete, Ila R},
booktitle = {ICML 2023 Workshops: SODS},
year = {2023},
url = {https://mlanthology.org/icmlw/2023/kirjner2023icmlw-optimizing/}
}