Revisiting Masked Auto-Encoders for ECG-Language Representation Learning

Abstract

We propose C-MELT, a novel framework for multimodal self-supervised learning of Electrocardiogram (ECG) and text encoders. C-MELT pre-trains a contrastive-enhanced masked auto-encoder architecture using ECG-text paired data. It exploits the generative strengths with improved discriminative capabilities to enable robust cross-modal alignment. This is accomplished through a carefully designed model, loss functions, and a novel negative sampling strategy. Our preliminary experiments demonstrate significant performance improvements with up to 12% in downstream cardiac arrhythmia classification and patient identification tasks. Our findings demonstrate C-MELT's capacity to extract rich, clinically relevant features from ECG-text pairs, paving the way for more accurate and efficient cardiac diagnoses in real-world healthcare settings.

Cite

Text

Hung et al. "Revisiting Masked Auto-Encoders for ECG-Language Representation Learning." NeurIPS 2024 Workshops: TSALM, 2024.

Markdown

[Hung et al. "Revisiting Masked Auto-Encoders for ECG-Language Representation Learning." NeurIPS 2024 Workshops: TSALM, 2024.](https://mlanthology.org/neuripsw/2024/hung2024neuripsw-revisiting/)

BibTeX

@inproceedings{hung2024neuripsw-revisiting,
  title     = {{Revisiting Masked Auto-Encoders for ECG-Language Representation Learning}},
  author    = {Hung, Manh Pham and Saeed, Aaqib and Ma, Dong},
  booktitle = {NeurIPS 2024 Workshops: TSALM},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/hung2024neuripsw-revisiting/}
}