SecretoGen: Towards Prediction of Signal Peptides for Efficient Protein Secretion

Abstract

Signal peptides (SPs) are short sequences at the N terminus of proteins that control their secretion in all living organisms. Secretion is of great importance in biotechnology, as industrial production of proteins in host organisms often requires the proteins to be secreted. SPs have varying secretion efficiency that is dependent both on the host organism and the protein they are combined with. Therefore, to optimize production yields, an SP with good efficiency needs to be identified for each protein. While SPs can be predicted accurately by machine learning models, such models have so far shown limited utility for predicting secretion efficiency. We introduce **SecretoGen**, a generative transformer trained on millions of naturally occuring SPs from diverse organisms. Evaluation on a range of secretion efficiency datasets show that SecretoGen's perplexity has promising performance for selecting efficient SPs, without requiring training on experimental efficiency data.

Cite

Text

Teufel et al. "SecretoGen: Towards Prediction of Signal Peptides for Efficient Protein Secretion." NeurIPS 2023 Workshops: GenBio, 2023.

Markdown

[Teufel et al. "SecretoGen: Towards Prediction of Signal Peptides for Efficient Protein Secretion." NeurIPS 2023 Workshops: GenBio, 2023.](https://mlanthology.org/neuripsw/2023/teufel2023neuripsw-secretogen/)

BibTeX

@inproceedings{teufel2023neuripsw-secretogen,
  title     = {{SecretoGen: Towards Prediction of Signal Peptides for Efficient Protein Secretion}},
  author    = {Teufel, Felix and Stahlhut, Carsten and Refsgaard, Jan and Nielsen, Henrik and Winther, Ole and Madsen, Dennis},
  booktitle = {NeurIPS 2023 Workshops: GenBio},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/teufel2023neuripsw-secretogen/}
}