Cell2Sentence: Teaching Large Language Models the Language of Biology
Abstract
We introduce Cell2Sentence (C2S), a novel method to directly adapt large language models to a biological context, specifically single-cell transcriptomics. By transforming gene expression data into "cell sentences," C2S bridges the gap between natural language processing and biology. We demonstrate cell sentences enable the fine-tuning of language models for diverse tasks in biology, including cell generation, complex cell-type annotation, and direct data-driven text generation. Our experiments reveal that GPT-2, when fine-tuned with C2S, can generate biologically valid cells based on cell type inputs, and accurately predict cell types from cell sentences. This illustrates that language models, through C2S fine-tuning, can acquire a significant understanding of single-cell biology while maintaining robust text generation capabilities. C2S offers a flexible, accessible framework to integrate natural language processing with transcriptomics, utilizing existing models and libraries for a wide range of biological applications.
Cite
Text
Levine et al. "Cell2Sentence: Teaching Large Language Models the Language of Biology." International Conference on Machine Learning, 2024.Markdown
[Levine et al. "Cell2Sentence: Teaching Large Language Models the Language of Biology." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/levine2024icml-cell2sentence/)BibTeX
@inproceedings{levine2024icml-cell2sentence,
title = {{Cell2Sentence: Teaching Large Language Models the Language of Biology}},
author = {Levine, Daniel and Rizvi, Syed A and Lévy, Sacha and Pallikkavaliyaveetil, Nazreen and Zhang, David and Chen, Xingyu and Ghadermarzi, Sina and Wu, Ruiming and Zheng, Zihe and Vrkic, Ivan and Zhong, Anna and Raskin, Daphne and Han, Insu and De Oliveira Fonseca, Antonio Henrique and Ortega Caro, Josue and Karbasi, Amin and Dhodapkar, Rahul Madhav and Van Dijk, David},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {27299-27325},
volume = {235},
url = {https://mlanthology.org/icml/2024/levine2024icml-cell2sentence/}
}