Instruction Tuning Large Language Models to Understand Electronic Health Records

Abstract

Large language models (LLMs) have shown impressive capabilities in solving a wide range of tasks based on human instructions. However, developing a conversational AI assistant for electronic health record (EHR) data remains challenging due to (1) the lack of large-scale instruction-following datasets and (2) the limitations of existing model architectures in handling complex and heterogeneous EHR data.In this paper, we introduce MIMIC-Instr, a dataset comprising over 400K open-ended instruction-following examples derived from the MIMIC-IV EHR database. This dataset covers various topics and is suitable for instruction-tuning general-purpose LLMs for diverse clinical use cases. Additionally, we propose Llemr, a general framework that enables LLMs to process and interpret EHRs with complex data structures. Llemr demonstrates competitive performance in answering a wide range of patient-related questions based on EHR data.Furthermore, our evaluations on clinical predictive modeling benchmarks reveal that the fine-tuned Llemr achieves performance comparable to state-of-the-art (SOTA) baselines using curated features. The dataset and code are available at \url{https://github.com/zzachw/llemr}.

Cite

Text

Wu et al. "Instruction Tuning Large Language Models to Understand Electronic Health Records." Neural Information Processing Systems, 2024. doi:10.52202/079017-1737

Markdown

[Wu et al. "Instruction Tuning Large Language Models to Understand Electronic Health Records." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/wu2024neurips-instruction/) doi:10.52202/079017-1737

BibTeX

@inproceedings{wu2024neurips-instruction,
  title     = {{Instruction Tuning Large Language Models to Understand Electronic Health Records}},
  author    = {Wu, Zhenbang and Dadu, Anant and Nalls, Mike and Faghri, Faraz and Sun, Jimeng},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1737},
  url       = {https://mlanthology.org/neurips/2024/wu2024neurips-instruction/}
}