Exploring Zero-Shot Structure-Based Protein Fitness Prediction
Abstract
The ability to make zero-shot predictions about the fitness consequences of protein sequence changes with pre-trained machine learning models enables many practical applications. Such models can be applied for downstream tasks like genetic variant interpretation and protein engineering without additional labeled data. The advent of capable protein structure prediction tools has led to the availability of orders of magnitude more precomputed predicted structures, giving rise to powerful structure-based fitness predic- tion models. Through our experiments, we assess several modeling choices for structure-based models and their effects on downstream fitness predic- tion. We find that training on predicted structures can negatively affect downstream predictions when using experimental structures, zero-shot fit- ness prediction models can struggle to learn fitness landscape of proteins with disordered regions (lacking a fixed 3D structure), and that predicted structures for disordered regions can be misleading in this setting and affect predictive performance. Lastly, we evaluate an additional structure-based model on the ProteinGym substitution benchmark and show that simple multi-modal ensembles are strong baselines.
Cite
Text
Sharma and Gitter. "Exploring Zero-Shot Structure-Based Protein Fitness Prediction." ICLR 2025 Workshops: GEM, 2025.Markdown
[Sharma and Gitter. "Exploring Zero-Shot Structure-Based Protein Fitness Prediction." ICLR 2025 Workshops: GEM, 2025.](https://mlanthology.org/iclrw/2025/sharma2025iclrw-exploring/)BibTeX
@inproceedings{sharma2025iclrw-exploring,
title = {{Exploring Zero-Shot Structure-Based Protein Fitness Prediction}},
author = {Sharma, Arnav and Gitter, Anthony},
booktitle = {ICLR 2025 Workshops: GEM},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/sharma2025iclrw-exploring/}
}