ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation
Abstract
Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML emulators. However, these hybrid physics-ML simulations require domain-specific data and workflows that have been inaccessible to many ML experts. This paper is an extended version of our NeurIPS award-winning ClimSim dataset paper. The ClimSim dataset includes 5.7 billion pairs of multivariate input/output vectors spanning ten years at high temporal resolution, capturing the influence of high-resolution, high-fidelity physics on a host climate simulator's macro-scale state. In this extended version, we introduce a significant new contribution in Section 5, which provides a cross-platform, containerized pipeline to integrate ML models into operational climate simulators for hybrid testing. We also implement various baselines of ML models and hybrid simulators to highlight the ML challenges of building stable, skillful emulators. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res, also in a low-resolution version at https://huggingface.co/datasets/LEAP/ClimSim_low-res and an aquaplanet version at https://huggingface.co/datasets/LEAP/ClimSim_low-res_aqua-planet) and code (https://leap-stc.github.io/ClimSim and https://github.com/leap-stc/climsim-online) are publicly released to support the development of hybrid physics-ML and high-fidelity climate simulations.
Cite
Text
Yu et al. "ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation." Journal of Machine Learning Research, 2025.Markdown
[Yu et al. "ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation." Journal of Machine Learning Research, 2025.](https://mlanthology.org/jmlr/2025/yu2025jmlr-climsimonline/)BibTeX
@article{yu2025jmlr-climsimonline,
title = {{ClimSim-Online: A Large Multi-Scale Dataset and Framework for Hybrid Physics-ML Climate Emulation}},
author = {Yu, Sungduk and Hu, Zeyuan and Subramaniam, Akshay and Hannah, Walter and Peng, Liran and Lin, Jerry and Bhouri, Mohamed Aziz and Gupta, Ritwik and Lütjens, Björn and Will, Justus C. and Behrens, Gunnar and Busecke, Julius J. M. and Loose, Nora and Stern, Charles I and Beucler, Tom and Harrop, Bryce and Heuer, Helge and Hillman, Benjamin R and Jenney, Andrea and Liu, Nana and White, Alistair and Zheng, Tian and Kuang, Zhiming and Ahmed, Fiaz and Barnes, Elizabeth and Brenowitz, Noah D. and Bretherton, Christopher and Eyring, Veronika and Ferretti, Savannah and Lutsko, Nicholas and Gentine, Pierre and Mandt, Stephan and Neelin, J. David and Yu, Rose and Zanna, Laure and Urban, Nathan M. and Yuval, Janni and Abernathey, Ryan and Baldi, Pierre and Chuang, Wayne and Huang, Yu and Iglesias-Suarez, Fernando and Jantre, Sanket and Ma, Po-Lun and Shamekh, Sara and Zhang, Guang and Pritchard, Michael},
journal = {Journal of Machine Learning Research},
year = {2025},
pages = {1-85},
volume = {26},
url = {https://mlanthology.org/jmlr/2025/yu2025jmlr-climsimonline/}
}