Identifying Structure in the MIMIC ICU Dataset

Abstract

The MIMIC-III dataset, containing trajectories of 40,000 ICU patients, is one of the most popular datasets in machine learning for health space. However, there has been very little systematic exploration to understand what is the natural structure of these data---most analyses enforce some type of top-down clustering or embedding. We take a bottom-up approach, identifying consistent structures that are robust across a range of embedding choices. We identified two dominant structures sorted by either fraction-inspired oxygen or creatinine --- both of which were validated as the key features by our clinical co-author. Our bottom-up approach in studying the macro-structure of a dataset can also be adapted for other datasets.

Cite

Text

Chin et al. "Identifying Structure in the MIMIC ICU Dataset." NeurIPS 2022 Workshops: TS4H, 2022.

Markdown

[Chin et al. "Identifying Structure in the MIMIC ICU Dataset." NeurIPS 2022 Workshops: TS4H, 2022.](https://mlanthology.org/neuripsw/2022/chin2022neuripsw-identifying/)

BibTeX

@inproceedings{chin2022neuripsw-identifying,
  title     = {{Identifying Structure in the MIMIC ICU Dataset}},
  author    = {Chin, Zad and Raval, Shivam and Doshi-Velez, Finale and Wattenberg, Martin and Celi, Leo Anthony},
  booktitle = {NeurIPS 2022 Workshops: TS4H},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/chin2022neuripsw-identifying/}
}