ClimSim: A Large Multi-Scale Dataset for Hybrid Physics-ML Climate Emulation
Abstract
Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise predictions of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state.The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim) are released openly to support the development of hybrid ML-physics and high-fidelity climate simulations for the benefit of science and society.
Cite
Text
Yu et al. "ClimSim: A Large Multi-Scale Dataset for Hybrid Physics-ML Climate Emulation." Neural Information Processing Systems, 2023.Markdown
[Yu et al. "ClimSim: A Large Multi-Scale Dataset for Hybrid Physics-ML Climate Emulation." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/yu2023neurips-climsim/)BibTeX
@inproceedings{yu2023neurips-climsim,
title = {{ClimSim: A Large Multi-Scale Dataset for Hybrid Physics-ML Climate Emulation}},
author = {Yu, Sungduk and Hannah, Walter and Peng, Liran and Lin, Jerry and Bhouri, Mohamed Aziz and Gupta, Ritwik and Lütjens, Björn and Will, Justus C. and Behrens, Gunnar and Busecke, Julius and Loose, Nora and Stern, Charles and Beucler, Tom and Harrop, Bryce and Hillman, Benjamin and Jenney, Andrea and Ferretti, Savannah L. and Liu, Nana and Anandkumar, Animashree and Brenowitz, Noah and Eyring, Veronika and Geneva, Nicholas and Gentine, Pierre and Mandt, Stephan and Pathak, Jaideep and Subramaniam, Akshay and Vondrick, Carl and Yu, Rose and Zanna, Laure and Zheng, Tian and Abernathey, Ryan and Ahmed, Fiaz and Bader, David and Baldi, Pierre and Barnes, Elizabeth and Bretherton, Christopher and Caldwell, Peter and Chuang, Wayne and Han, Yilun and Huang, Yu and Iglesias-Suarez, Fernando and Jantre, Sanket and Kashinath, Karthik and Khairoutdinov, Marat and Kurth, Thorsten and Lutsko, Nicholas and Ma, Po-Lun and Mooers, Griffin and Neelin, J. David and Randall, David and Shamekh, Sara and Taylor, Mark and Urban, Nathan and Yuval, Janni and Zhang, Guang and Pritchard, Mike},
booktitle = {Neural Information Processing Systems},
year = {2023},
url = {https://mlanthology.org/neurips/2023/yu2023neurips-climsim/}
}