The Multimodal Universe: Enabling Large-Scale Machine Learning with 100 TB of Astronomical Scientific Data
Abstract
We present the Multimodal Universe, a large-scale multimodal dataset of scientific astronomical data, compiled specifically to facilitate machine learning research. Overall, our dataset contains hundreds of millions of astronomical observations, constituting 100TB of multi-channel and hyper-spectral images, spectra, multivariate time series, as well as a wide variety of associated scientific measurements and metadata. In addition, we include a range of benchmark tasks representative of standard practices for machine learning methods in astrophysics. This massive dataset will enable the development of large multi-modal models specifically targeted towards scientific applications. All codes used to compile the dataset, and a description of how to access the data is available at https://github.com/MultimodalUniverse/MultimodalUniverse
Cite
Text
Angeloudi et al. "The Multimodal Universe: Enabling Large-Scale Machine Learning with 100 TB of Astronomical Scientific Data." Neural Information Processing Systems, 2024. doi:10.52202/079017-1845Markdown
[Angeloudi et al. "The Multimodal Universe: Enabling Large-Scale Machine Learning with 100 TB of Astronomical Scientific Data." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/angeloudi2024neurips-multimodal/) doi:10.52202/079017-1845BibTeX
@inproceedings{angeloudi2024neurips-multimodal,
title = {{The Multimodal Universe: Enabling Large-Scale Machine Learning with 100 TB of Astronomical Scientific Data}},
author = {Angeloudi, Eirini and Audenaert, Jeroen and Bowles, Micah and Boyd, Benjamin M. and Chemaly, David and Cherinka, Brian and Ciucă, Ioana and Cranmer, Miles and Do, Aaron and Grayling, Matthew and Hayes, Erin E. and Hehir, Tom and Ho, Shirley and Huertas-Company, Marc and Iyer, Kartheik G. and Jablonska, Maja and Lanusse, Francois and Leung, Henry W. and Mandel, Kaisey and Martínez-Galarza, Juan Rafael and Melchior, Peter and Meyer, Lucas and Parker, Liam H. and Qu, Helen and Shen, Jeff and Smith, Michael J. and Stone, Connor and Walmsley, Mike and Wu, John F.},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-1845},
url = {https://mlanthology.org/neurips/2024/angeloudi2024neurips-multimodal/}
}