MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems

Abstract

The acquisition of different data modalities can enhance our knowledge and understanding of various diseases, paving the way for a more personalized healthcare. Thus, medicine is progressively moving towards the generation of massive amounts of multi-modal data (e.g, molecular, radiology, and histopathology). While this may seem like an ideal environment to capitalize data-centric machine learning approaches, most methods still focus on exploring a single or a pair of modalities due to a variety of reasons: i) lack of ready to use curated datasets; ii) difficulty in identifying the best multi-modal fusion strategy; and iii) missing modalities across patients. In this paper we introduce a real world multi-modal dataset called MMIST-CCRCC that comprises 2 radiology modalities (CT and MRI), histopathology, genomics, and clinical data from 618 patients with clear cell renal cell carcinoma (ccRCC). We provide single and multi-modal (early and late fusion) benchmarks in the task of 12-month survival prediction in the challenging scenario of one or more missing modalities for each patient, with missing rates that range from 26% for genomics data to more than 90% for MRI. We show that even with such severe missing rates the fusion of modalities leads to improvements in the survival forecasting. Additionally, incorporating a strategy to generate the latent representations of the missing modalities given the available ones further improves the performance, highlighting a potential complementarity across modalities. Our dataset and code are available here: multi-modal-ist.github.io/datasets/ccRCC.

Cite

Text

Mota et al. "MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00246

Markdown

[Mota et al. "MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/mota2024cvprw-mmistccrcc/) doi:10.1109/CVPRW63382.2024.00246

BibTeX

@inproceedings{mota2024cvprw-mmistccrcc,
  title     = {{MMIST-ccRCC: A Real World Medical Dataset for the Development of Multi-Modal Systems}},
  author    = {Mota, Tiago and Verdelho, Maria Rita and Araújo, Diogo J. and Bissoto, Alceu and Santiago, Carlos and Barata, Catarina},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {2395-2403},
  doi       = {10.1109/CVPRW63382.2024.00246},
  url       = {https://mlanthology.org/cvprw/2024/mota2024cvprw-mmistccrcc/}
}