CodonBERT: Large Language Models for mRNA Design and Optimization

Abstract

mRNA based vaccines and therapeutics are gaining popularity and usage across a wide range of conditions. One of the critical issues when designing such mRNAs is sequence optimization. Even small proteins or peptides can be encoded by an enormously large number of mRNAs. The actual mRNA sequence can have a large impact on several properties including expression, stability, immunogenicity, and more. To enable the selection of an optimal sequence, we developed CodonBERT, a large language model (LLM) for mRNAs. Unlike prior models, CodonBERT uses codons as inputs which enables it to learn better representations. CodonBERT was trained using more than 10 million mRNA sequences from a diverse set of organisms. The resulting model captures important biological concepts. CodonBERT can also be extended to perform prediction tasks for various mRNA properties. CodonBERT outperforms previous mRNA prediction methods including on a new flu vaccine dataset.

Cite

Text

Li et al. "CodonBERT: Large Language Models for mRNA Design and Optimization." NeurIPS 2023 Workshops: GenBio, 2023.

Markdown

[Li et al. "CodonBERT: Large Language Models for mRNA Design and Optimization." NeurIPS 2023 Workshops: GenBio, 2023.](https://mlanthology.org/neuripsw/2023/li2023neuripsw-codonbert/)

BibTeX

@inproceedings{li2023neuripsw-codonbert,
  title     = {{CodonBERT: Large Language Models for mRNA Design and Optimization}},
  author    = {Li, Sizhen and Moayedpour, Saeed and Li, Ruijiang and Bailey, Michael and Riahi, Saleh and Miladi, Milad and Miner, Jacob and Zheng, Dinghai and Wang, Jun and Balsubramani, Akshay and Tran, Khang and Minnie,  and Wu, Monica and Gu, Xiaobo and Clinton, Ryan and Asquith, Carla and Skaleski, Joseph and Boeglin, Lianne and Chivukula, Sudha and Dias, Anusha and Montoya, Fernando Ulloa and Agarwal, Vikram and Bar-Joseph, Ziv and Jager, Sven},
  booktitle = {NeurIPS 2023 Workshops: GenBio},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/li2023neuripsw-codonbert/}
}