Towards Large-Scale Clinical Multi-Variate Time-Series Datasets

Abstract

Notable progress has been made in generalist medical Large Language Models (LLMs) across various healthcare areas. However, large-scale modeling of in-hospital time series data—such as vital signs, lab results, and treatments in Intensive Care Units (ICUs)—remains underexplored. Existing ICU datasets are relatively small, but combining them can enhance patient diversity and improve model robustness. To generalize across hospitals, models must also address distribution shifts caused by varying treatment policies, which requires harmonization of treatment variables across datasets. This work aims to establish a foundation for training large-scale multi-variate time series models on critical care data and to provide a benchmark for machine learning models in transfer learning across hospitals to study and address distribution shift challenges. We introduce a harmonized dataset for research in sequence modeling and transfer learning, representing the first large-scale collection to include core treatment variables. Future plans involve expanding this dataset to further support advancements in transfer learning and the development of scalable, generalizable models for critical healthcare applications.

Cite

Text

Burger et al. "Towards Large-Scale Clinical Multi-Variate Time-Series Datasets." NeurIPS 2024 Workshops: TSALM, 2024.

Markdown

[Burger et al. "Towards Large-Scale Clinical Multi-Variate Time-Series Datasets." NeurIPS 2024 Workshops: TSALM, 2024.](https://mlanthology.org/neuripsw/2024/burger2024neuripsw-largescale/)

BibTeX

@inproceedings{burger2024neuripsw-largescale,
  title     = {{Towards Large-Scale Clinical Multi-Variate Time-Series Datasets}},
  author    = {Burger, Manuel and Sergeev, Fedor and Londschien, Malte and Chopard, Daphné and Yèche, Hugo and Gerdes, Eike Christian and Leshetkina, Polina and Morgenroth, Alexander and Babür, Zeynep and Bogojeska, Jasmina and Faltys, Martin and Kuznetsova, Rita and Ratsch, Gunnar},
  booktitle = {NeurIPS 2024 Workshops: TSALM},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/burger2024neuripsw-largescale/}
}