SmokeViz: A Large-Scale Satellite Dataset for Wildfire Smoke Detection and Segmentation

Abstract

The global rise in wildfire frequency and intensity over the past decade underscores the need for improved fire monitoring techniques. To advance deep learning research on wildfire detection and its associated human health impacts, we introduce **SmokeViz**, a large-scale machine learning dataset of smoke plumes in satellite imagery. The dataset is derived from expert annotations created by smoke analysts at the National Oceanic and Atmospheric Administration, which provide coarse temporal and spatial approximations of smoke presence. To enhance annotation precision, we propose **pseudo-label dimension reduction (PLDR)**, a generalizable method that applies pseudo-labeling to refine datasets with mismatching temporal and/or spatial resolutions. Unlike typical pseudo-labeling applications that aim to increase the number of labeled samples, PLDR maintains the original labels but increases the dataset quality by solving for intermediary pseudo-labels (IPLs) that align each annotation to the most representative input data. For SmokeViz, a parent model produces IPLs to identify the single satellite image within each annotations time window that best corresponds with the smoke plume. This refinement process produces a succinct and relevant deep learning dataset consisting of over 160,000 manual annotations. The SmokeViz dataset is expected to be a valuable resource to develop further wildfire-related machine learning models and is publicly available at \url{https://noaa-gsl-experimental-pds.s3.amazonaws.com/index.html#SmokeViz/}.

Cite

Text

Koki et al. "SmokeViz: A Large-Scale Satellite Dataset for Wildfire Smoke Detection and Segmentation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Koki et al. "SmokeViz: A Large-Scale Satellite Dataset for Wildfire Smoke Detection and Segmentation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/koki2025neurips-smokeviz/)

BibTeX

@inproceedings{koki2025neurips-smokeviz,
  title     = {{SmokeViz: A Large-Scale Satellite Dataset for Wildfire Smoke Detection and Segmentation}},
  author    = {Koki, Rey and McCabe, Michael and Kedar, Dhruv and Myers-Dean, Josh and Wade, Annabel and Stewart, Jebb Q. and Kumler-Bonfanti, Christina and Brown, Jed},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/koki2025neurips-smokeviz/}
}