ProteinRL: Reinforcement Learning with Generative Protein Language Models for Property-Directed Sequence Design

Abstract

The overarching goal of protein engineering is the design and optimization of proteins customized for specific purposes. Generative protein language models (PLMs) allow for \textit{de novo} protein sequence generation, however current PLMs lack capabilities for controllable sequence generation of sequences tailored with desired properties. Here we present ProteinRL, a flexible, data-driven reinforcement learning framework for fine-tuning generative PLMs for the \textit{de novo} design of sequences optimized for specific sequence and/or structural properties. We highlight two example cases of realistic protein design goals: a single-objective design for sequences containing unusually high charge content, and a multi-objective design scenario of a hit expansion, diversifying a target sequence with generated sequences having high-confidence structure predictions and high probability predictions of soluble expression. In both cases ProteinRL fine-tuning guides the PLM towards generating sequences optimized for the defined properties, extending to values rarely or never seen in natural sequences or sequences generated without ProteinRL fine-tuning. The demonstrated success and adaptability of the ProteinRL framework allows for the \textit{de novo} design of novel protein sequences optimized for applications across many areas of protein engineering.

Cite

Text

Sternke and Karpiak. "ProteinRL: Reinforcement Learning with Generative Protein Language Models for Property-Directed Sequence Design." NeurIPS 2023 Workshops: GenBio, 2023.

Markdown

[Sternke and Karpiak. "ProteinRL: Reinforcement Learning with Generative Protein Language Models for Property-Directed Sequence Design." NeurIPS 2023 Workshops: GenBio, 2023.](https://mlanthology.org/neuripsw/2023/sternke2023neuripsw-proteinrl/)

BibTeX

@inproceedings{sternke2023neuripsw-proteinrl,
  title     = {{ProteinRL: Reinforcement Learning with Generative Protein Language Models for Property-Directed Sequence Design}},
  author    = {Sternke, Matt and Karpiak, Joel},
  booktitle = {NeurIPS 2023 Workshops: GenBio},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/sternke2023neuripsw-proteinrl/}
}