Benchmarking Probabilistic Machine Learning in Protein FItness Landscape Predictions
Abstract
Machine learning guided protein engineering, which consists of high-throughput screening and deep sequencing of protein mutagenesis libraries combined with machine learning is a powerful approach for engineering proteins and interrogating their fitness landscapes. Uncertainty quantification enhances the trustworthiness of model predictions by indicating reliability and thus can be used to guide downstream experimental work. Aleatoric uncertainty identifying inherent observational noise in protein properties and epistemic uncertainty revealing gaps in the model’s knowledge based on the amount of training data. Although uncertainty quantification has been investigated in the application of protein engineering, systematic benchmarks for probabilistic machine learning model selection and the benefits of different types of uncertainty in protein fitness predictions are lacking. Addressing this gap, our study benchmarks six advanced probabilistic modeling techniques across eleven diverse protein-fitness datasets, employing evaluation metrics on prediction accuracy and uncertainty quality to assess performance for both in-distribution and out-ofdistribution scenarios. Our findings offer valuable insights into the application of uncertaintyaware machine learning in high-throughput protein screening experiments. Our study supports more robust, efficient experimental processes and enhances the practical usability of machine learning models in real-word protein fitness related tasks such as therapeutic antibody optimization and viral evolution.
Cite
Text
Chen et al. "Benchmarking Probabilistic Machine Learning in Protein FItness Landscape Predictions." ICML 2024 Workshops: ML4LMS, 2024.Markdown
[Chen et al. "Benchmarking Probabilistic Machine Learning in Protein FItness Landscape Predictions." ICML 2024 Workshops: ML4LMS, 2024.](https://mlanthology.org/icmlw/2024/chen2024icmlw-benchmarking/)BibTeX
@inproceedings{chen2024icmlw-benchmarking,
title = {{Benchmarking Probabilistic Machine Learning in Protein FItness Landscape Predictions}},
author = {Chen, Ningning and Han, Wenkai and Reddy, Sai T.},
booktitle = {ICML 2024 Workshops: ML4LMS},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/chen2024icmlw-benchmarking/}
}