JoLT: Jointly Learned Representations of Language and Time-Series
Abstract
Time-series and text data is prevalent in healthcare and frequently exist in tandem, for e.g., in electrocardiogram (ECG) interpretation reports. Yet, these modalities are typically modeled independently. Even studies that jointly model time-series and text do so by converting time-series to images or graphs. We hypothesize that explicitly modeling time-series jointly with text can improve tasks such as summarization and question answering for time-series data, which have received little attention so far. To address this gap, we introduce JoLT to jointly learn desired representations from pre-trained time-series and text models. JoLT utilizes a Querying Transformer (Q-Former) to align the time-series and text representations. Our experiments on a large real-world electrocardiography dataset for medical time-series summarization show that JoLT outperforms state-of-the-art image captioning and medical question-answering approaches, and that the decoder architecture, size, and pre-training data can vary the performance on said tasks.
Cite
Text
Cai et al. "JoLT: Jointly Learned Representations of Language and Time-Series." NeurIPS 2023 Workshops: DGM4H, 2023.Markdown
[Cai et al. "JoLT: Jointly Learned Representations of Language and Time-Series." NeurIPS 2023 Workshops: DGM4H, 2023.](https://mlanthology.org/neuripsw/2023/cai2023neuripsw-jolt/)BibTeX
@inproceedings{cai2023neuripsw-jolt,
title = {{JoLT: Jointly Learned Representations of Language and Time-Series}},
author = {Cai, Yifu and Goswami, Mononito and Choudhry, Arjun and Srinivasan, Arvind and Dubrawski, Artur},
booktitle = {NeurIPS 2023 Workshops: DGM4H},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/cai2023neuripsw-jolt/}
}