Self-Supervised Multimodal Model for Astronomy

Abstract

While machine-learned models are now routinely employed to facilitate astronomical inquiry, model inputs tend to be limited to a primary data source (namely images or time series) and, in the more advanced approaches, some metadata. Yet with the growing use of wide-field, multiplexed observational resources, individual sources of interest often have a broad range of observational modes available. Here we construct an astronomical multimodal dataset and propose a self-supervised pre-training approach that enables a model to learn from multiple modalities simultaneously. Specifically, we extend the CLIP (Contrastive Language-Image Pretraining) model to a trimodal setting, allowing the integration of time-series photometry data, spectra, and astrophysical metadata. In a fine-tuning supervised setting, our results demonstrate that CLIP pre-training improves classification performance for time-series photometry, where accuracy increases from 84.6% to 91.5%. Furthermore, CLIP boosts classification accuracy by up to 12.6% when the availability of labeled data is limited, showing the effectiveness of leveraging larger corpora of unlabeled data. To our knowledge this is the first construction of an n > 2 mode model in astronomy. Extensions to n > 3 modes is naturally anticipated with this approach.

Cite

Text

Rizhko and Bloom. "Self-Supervised Multimodal Model for Astronomy." NeurIPS 2024 Workshops: FM4Science, 2024.

Markdown

[Rizhko and Bloom. "Self-Supervised Multimodal Model for Astronomy." NeurIPS 2024 Workshops: FM4Science, 2024.](https://mlanthology.org/neuripsw/2024/rizhko2024neuripsw-selfsupervised/)

BibTeX

@inproceedings{rizhko2024neuripsw-selfsupervised,
  title     = {{Self-Supervised Multimodal Model for Astronomy}},
  author    = {Rizhko, Mariia and Bloom, Joshua S.},
  booktitle = {NeurIPS 2024 Workshops: FM4Science},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/rizhko2024neuripsw-selfsupervised/}
}