A Large-Scale Foundation Model for RNA Function and Structure Prediction

Abstract

Originally marginalized as an intermediate in the information flow from DNA to protein, RNA has become the star of modern biology, holding the key to precision therapeutics, genetic engineering, evolutionary origins, and our understanding of fundamental cellular processes. Yet RNA is as mysterious as it is prolific, serving as an information store, a messenger, and a catalyst, spanning many undercharacterized functional and structural classes. Deciphering the language of RNA is important not only for a mechanistic understanding of its biological functions but also for accelerating drug design. Toward this goal, we introduce AIDO.RNA, a pre-trained module for RNA in an AI-driven Digital Organism [1]. AIDO.RNA contains a scale of 1.6 billion parameters, trained on 42 million non-coding RNA (ncRNA) sequences at single-nucleotide resolution, and it achieves state-of-theart performance on a comprehensive set of tasks, including structure prediction, genetic regulation, molecular function across species, and RNA sequence design. AIDO.RNA after domain adaptation learns to model essential parts of protein translation that protein language models, which have received widespread attention in recent years, do not. More broadly, AIDO.RNA hints at the generality of biological sequence modeling and the ability to leverage the central dogma to improve many biomolecular representations. Models and code are available through ModelGenerator in https://github.com/genbio-ai/AIDO and on Hugging Face.

Cite

Text

Zou et al. "A Large-Scale Foundation Model for RNA Function and Structure Prediction." NeurIPS 2024 Workshops: AIDrugX, 2024.

Markdown

[Zou et al. "A Large-Scale Foundation Model for RNA Function and Structure Prediction." NeurIPS 2024 Workshops: AIDrugX, 2024.](https://mlanthology.org/neuripsw/2024/zou2024neuripsw-largescale/)

BibTeX

@inproceedings{zou2024neuripsw-largescale,
  title     = {{A Large-Scale Foundation Model for RNA Function and Structure Prediction}},
  author    = {Zou, Shuxian and Tao, Tianhua and Mahbub, Sazan and Ellington, Caleb and Algayres, Robin Jonathan and Li, Dian and Zhuang, Yonghao and Wang, Hongyi and Song, Le and Xing, Eric P.},
  booktitle = {NeurIPS 2024 Workshops: AIDrugX},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/zou2024neuripsw-largescale/}
}