STimage-1K4M: A Histopathology Image-Gene Expression Dataset for Spatial Transcriptomics
Abstract
Recent advances in multi-modal algorithms have driven and been driven by the increasing availability of large image-text datasets, leading to significant strides in various fields, including computational pathology. However, in most existing medical image-text datasets, the text typically provides high-level summaries that may not sufficiently describe sub-tile regions within a large pathology image. For example, an image might cover an extensive tissue area containing cancerous and healthy regions, but the accompanying text might only specify that this image is a cancer slide, lacking the nuanced details needed for in-depth analysis. In this study, we introduce STimage-1K4M, a novel dataset designed to bridge this gap by providing genomic features for sub-tile images. STimage-1K4M contains 1,149 images derived from spatial transcriptomics data, which captures gene expression information at the level of individual spatial spots within a pathology image. Specifically, each image in the dataset is broken down into smaller sub-image tiles, with each tile paired with $15,000-30,000$ dimensional gene expressions. With $4,293,195$ pairs of sub-tile images and gene expressions, STimage-1K4M offers unprecedented granularity, paving the way for a wide range of advanced research in multi-modal data analysis an innovative applications in computational pathology, and beyond.
Cite
Text
Chen et al. "STimage-1K4M: A Histopathology Image-Gene Expression Dataset for Spatial Transcriptomics." Neural Information Processing Systems, 2024. doi:10.52202/079017-1129Markdown
[Chen et al. "STimage-1K4M: A Histopathology Image-Gene Expression Dataset for Spatial Transcriptomics." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/chen2024neurips-stimage1k4m/) doi:10.52202/079017-1129BibTeX
@inproceedings{chen2024neurips-stimage1k4m,
title = {{STimage-1K4M: A Histopathology Image-Gene Expression Dataset for Spatial Transcriptomics}},
author = {Chen, Jiawen and Zhou, Muqing and Wu, Wenrong and Zhang, Jinwei and Li, Yun and Li, Didong},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-1129},
url = {https://mlanthology.org/neurips/2024/chen2024neurips-stimage1k4m/}
}