Revisiting Masked Auto-Encoders for ECG-Language Representation Learning
Abstract
We propose C-MELT, a novel framework for multimodal self-supervised learning of Electrocardiogram (ECG) and text encoders. C-MELT pre-trains a contrastive-enhanced masked auto-encoder architecture using ECG-text paired data. It exploits the generative strengths with improved discriminative capabilities to enable robust cross-modal alignment. This is accomplished through a carefully designed model, loss functions, and a novel negative sampling strategy. Our preliminary experiments demonstrate significant performance improvements with up to 12% in downstream cardiac arrhythmia classification and patient identification tasks. Our findings demonstrate C-MELT's capacity to extract rich, clinically relevant features from ECG-text pairs, paving the way for more accurate and efficient cardiac diagnoses in real-world healthcare settings.
Cite
Text
Hung et al. "Revisiting Masked Auto-Encoders for ECG-Language Representation Learning." NeurIPS 2024 Workshops: TSALM, 2024.Markdown
[Hung et al. "Revisiting Masked Auto-Encoders for ECG-Language Representation Learning." NeurIPS 2024 Workshops: TSALM, 2024.](https://mlanthology.org/neuripsw/2024/hung2024neuripsw-revisiting/)BibTeX
@inproceedings{hung2024neuripsw-revisiting,
title = {{Revisiting Masked Auto-Encoders for ECG-Language Representation Learning}},
author = {Hung, Manh Pham and Saeed, Aaqib and Ma, Dong},
booktitle = {NeurIPS 2024 Workshops: TSALM},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/hung2024neuripsw-revisiting/}
}