CellPLM: Pre-Training of Cell Language Model Beyond Single Cells
Abstract
The current state-of-the-art single-cell pre-trained models are greatly inspired by the success of large language models. They trained transformers by treating genes as tokens and cells as sentences. However, three fundamental differences between single-cell data and natural language data are overlooked: (1) scRNA-seq data are presented as bag-of-genes instead of sequences of RNAs; (2) Cell-cell relations are more intricate and important than inter-sentence relations; and (3) The quantity of single-cell data is considerably inferior to text data, and they are very noisy. In light of these characteristics, we propose a new pre-trained model, $\textit{CellPLM}$, which takes cells as tokens and tissues as sentences. In addition, we leverage spatially-resolved transcriptomic data in pre-training to facilitate learning cell-cell relationships and introduce a Gaussian prior distribution as an additional inductive bias to overcome data limitations. $\textit{CellPLM}$ is the first single-cell pre-trained transformer that encodes cell-cell relations and it consistently outperforms existing pre-trained and non-pre-trained models in diverse downstream tasks, with 100 times higher inference speed on generating cell embeddings than previous pre-trained models.
Cite
Text
Wen et al. "CellPLM: Pre-Training of Cell Language Model Beyond Single Cells." International Conference on Learning Representations, 2024.Markdown
[Wen et al. "CellPLM: Pre-Training of Cell Language Model Beyond Single Cells." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/wen2024iclr-cellplm/)BibTeX
@inproceedings{wen2024iclr-cellplm,
title = {{CellPLM: Pre-Training of Cell Language Model Beyond Single Cells}},
author = {Wen, Hongzhi and Tang, Wenzhuo and Dai, Xinnan and Ding, Jiayuan and Jin, Wei and Xie, Yuying and Tang, Jiliang},
booktitle = {International Conference on Learning Representations},
year = {2024},
url = {https://mlanthology.org/iclr/2024/wen2024iclr-cellplm/}
}