Masked Modeling for Single-Cell Clustering of scRNA‐seq Data

Abstract

Single-cell clustering of scRNA-seq data is a typical and challenging problem that predicts cell subtype clusters given gene expression sequences from single-cell RNA data. Previous models utilized classical clustering (e.g., Principal Component Analysis, K-means) on well-annotated data to classify cells. However, they extremely relied on the expected number of clusters as input. To address the problem, in this work, we propose a novel multimodal self-supervised framework with masked expression modeling on single-cell data, namely mask-sc, that can learn compact and discriminative representations by reconstructing masked gene expression for scRNA-seq clustering. Our mask-sc aggregates high-frequency interconnections across multiple groups of expression sequences via a masked expression encoder applied on expression matrices. Then, a sequence-guided decoder is applied to recover sequence-level features of masked expression matrices. Finally, representations extracted from the gene expression encoder can be used for scRNA-seq clustering. We conduct extensive experiments on two scRNA-seq datasets, where empirical results demonstrate the effectiveness of our proposed mask-sc against previous baselines.

Cite

Text

Mo. "Masked Modeling for Single-Cell Clustering of scRNA‐seq Data." NeurIPS 2024 Workshops: AIM-FM, 2024.

Markdown

[Mo. "Masked Modeling for Single-Cell Clustering of scRNA‐seq Data." NeurIPS 2024 Workshops: AIM-FM, 2024.](https://mlanthology.org/neuripsw/2024/mo2024neuripsw-masked/)

BibTeX

@inproceedings{mo2024neuripsw-masked,
  title     = {{Masked Modeling for Single-Cell Clustering of scRNA‐seq Data}},
  author    = {Mo, Shentong},
  booktitle = {NeurIPS 2024 Workshops: AIM-FM},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/mo2024neuripsw-masked/}
}