Bayesian Optimization for Protein Sequence Design: Back to Simplicity with Gaussian Processes

Abstract

Bayesian optimization (BO) is a popular sequential decision making approach for maximizing black-box functions in low-data regimes. In biology, it has been used to find well-performing protein sequence candidates since gradient information is not available from in vitro experimentation. Recent in silico design methods have leveraged large pre-trained protein language models (PLMs) to predict protein fitness. However PLMs have a number of shortcomings for sequential design tasks: i) their current limitation to model uncertainty, ii) the lack of closed-form Bayesian updates in light of new experimental data, and iii) the challenge of fine-tuning on small downstream task datasets. We take a step back to traditional BO by investigating Gaussian process (GP) surrogate models with various sequence kernels, which are able to properly model uncertainty and update their belief over multi-round design tasks. We empirically evaluate our method on the sequence design benchmark ProteinGym, and demonstrate that BO with GPs is competitive with large SOTA pre-trained PLMs at a fraction of the compute budget.

Cite

Text

Benjamins et al. "Bayesian Optimization for Protein Sequence Design: Back to Simplicity with Gaussian Processes." NeurIPS 2024 Workshops: AI4Mat, 2024.

Markdown

[Benjamins et al. "Bayesian Optimization for Protein Sequence Design: Back to Simplicity with Gaussian Processes." NeurIPS 2024 Workshops: AI4Mat, 2024.](https://mlanthology.org/neuripsw/2024/benjamins2024neuripsw-bayesian/)

BibTeX

@inproceedings{benjamins2024neuripsw-bayesian,
  title     = {{Bayesian Optimization for Protein Sequence Design: Back to Simplicity with Gaussian Processes}},
  author    = {Benjamins, Carolin and Surana, Shikha and Bent, Oliver and Lindauer, Marius and Duckworth, Paul},
  booktitle = {NeurIPS 2024 Workshops: AI4Mat},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/benjamins2024neuripsw-bayesian/}
}