From Minimal Data to Maximal Insight: A Machine Learning Guided Platform for Peptide Discovery

Abstract

Peptide biologics represent a promising therapeutic frontier, but their discovery and optimization are often hindered by the requirement for extensive training datasets in machine learning approaches. Here we present Minimal Data Maximal Insight (MDMI), a novel computational method that enables peptide discovery using limited data (~100 sequences). Using a split Green Fluorescent Protein (GFP) system as our model, we develop a sequence-agnostic model with statistical potential scoring and physics-based evaluation to create an ensemble predictive model. This is coupled with a genetic algorithm for sequence optimization. With only one round of screening, we developed a model that yielded novel functional sequences 63% of which exhibited fluorescence. Notably, by analyzing high-activity sequences to identify favorable amino acids at each position, we were able to design peptide variants with more than 50% sequence difference from the wild type -far exceeding the mutation rates present in our training data- while maintaining functionality. By reducing dependency on large datasets, MDMI democratizes access to advanced computational tools for peptide engineering and offers a blueprint for accelerating therapeutic peptide discovery across various applications, from antimicrobials to targeted drug delivery.

Cite

Text

Bayat et al. "From Minimal Data to Maximal Insight: A Machine Learning Guided Platform for Peptide Discovery." ICLR 2025 Workshops: GEM, 2025.

Markdown

[Bayat et al. "From Minimal Data to Maximal Insight: A Machine Learning Guided Platform for Peptide Discovery." ICLR 2025 Workshops: GEM, 2025.](https://mlanthology.org/iclrw/2025/bayat2025iclrw-minimal/)

BibTeX

@inproceedings{bayat2025iclrw-minimal,
  title     = {{From Minimal Data to Maximal Insight: A Machine Learning Guided Platform for Peptide Discovery}},
  author    = {Bayat, Pouriya and Perkins, Spencer and Clancy, Sebastian and Patel, Sahil Swapnesh and Yin, Richard Fei and Bozovičar, Krištof and Iwe, Idorenyin and Simchi, Mohammad and Zeisler, Ilan Yaniv and Singh, Serena and White, Vivian and Xie, Matthew and Palter, Sean and Pardee, Keith},
  booktitle = {ICLR 2025 Workshops: GEM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/bayat2025iclrw-minimal/}
}