Pre-Training of Single-Cell Language Models Through Genetic Pathway Learning
Abstract
The utilization of state-of-the-art single-cell RNA sequencing (scRNA-seq) techniques has significantly enhanced the depth and richness of scRNA-seq datasets, contributing to a more comprehensive comprehension of cellular biology and facilitating advancements across a spectrum of research domains. In this work, we propose a novel $\textbf{S}$ingle-$\textbf{c}$ell Pre-trained $\textbf{L}$anguage $\textbf{M}$odel via Genetic $\textbf{Pa}$thway Learning, named scPaLM, that effectively harnesses scRNA-seq data and enables various downstream applications. scPaLM integrates several innovative designs: ($1$) an embedding process that adeptly represents gene information with a reduced token count, enhancing computational efficiency; ($2$) a genetic pathway learning module that is designed to learn discrete representations, enabling the modeling of collective gene behaviors in a data-driven way; ($3$) an innovative training methodology that progressively aggregates cell representations into a designated token during the training phase, with a tailored masking strategy and a token-level contrastive regularizer. scPaLM demonstrates superior performance on various downstream tasks, including cell type annotations, imputation, and cancer drug response prediction, by clear margins compared to baselines. Codes will be made public.
Cite
Text
Chen et al. "Pre-Training of Single-Cell Language Models Through Genetic Pathway Learning." ICML 2024 Workshops: AccMLBio, 2024.Markdown
[Chen et al. "Pre-Training of Single-Cell Language Models Through Genetic Pathway Learning." ICML 2024 Workshops: AccMLBio, 2024.](https://mlanthology.org/icmlw/2024/chen2024icmlw-pretraining/)BibTeX
@inproceedings{chen2024icmlw-pretraining,
title = {{Pre-Training of Single-Cell Language Models Through Genetic Pathway Learning}},
author = {Chen, Xuxi and Wang, Zhangyang and Zitnik, Marinka and Kellis, Manolis and Chen, Tianlong},
booktitle = {ICML 2024 Workshops: AccMLBio},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/chen2024icmlw-pretraining/}
}