ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models
Abstract
The use of Large Language Models (LLMs) in climate science has recently gained significant attention. However, a critical issue remains: the lack of a comprehensive evaluation framework capable of assessing the quality and scientific validity of model outputs. To address this issue, we develop *ClimaGen* (Climate QA Generator), an adaptive learning framework that generates question-answer pairs from graduate textbooks with climate scientists in the loop. As a result, we present *ClimaQA-Gold*, an expert-annotated benchmark dataset alongside *ClimaQA-Silver*, a large-scale, comprehensive synthetic QA dataset for climate science. Finally, we develop evaluation strategies and compare different LLMs on our benchmarks. Our results offer novel insights into various approaches used to enhance knowledge of climate LLMs. ClimaQA’s source code is publicly available at https://github.com/Rose-STL-Lab/genie-climaqa
Cite
Text
Manivannan et al. "ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models." International Conference on Learning Representations, 2025.Markdown
[Manivannan et al. "ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/manivannan2025iclr-climaqa/)BibTeX
@inproceedings{manivannan2025iclr-climaqa,
title = {{ClimaQA: An Automated Evaluation Framework for Climate Question Answering Models}},
author = {Manivannan, Veeramakali Vignesh and Jafari, Yasaman and Eranky, Srikar and Ho, Spencer and Yu, Rose and Watson-Parris, Duncan and Ma, Yian and Bergen, Leon and Berg-Kirkpatrick, Taylor},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/manivannan2025iclr-climaqa/}
}